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Joseph Goguen 


Preface 


Joseph Goguen is one of the most prominent computer scientists worldwide. 
His numerous research contributions span many topics and have changed the 
way we think about many concepts. Our views about data types, programming 
languages, software specification and verification, computational behavior, logics 
in computer science, semiotics, interface design, multimedia, and consciousness, 
to mention just some of the areas, have all been enriched in fundamental ways 
by his ideas. 

Considering just one strand of his work, namely, the area of algebraic spec- 
ifications, his ideas have been enormously influential. The concept of initiality 
(or co-initiality) that he introduced is now a fundamental concept in theoretical 
computer science applied in many subfields. The Clear formal specification lan- 
guage was the first language with general theory composition operations based 
on categorical algebra. Such generality inspired Goguen and Burstall to propose 
institutions as a meta-logical theory of logics, so that Clear-like languages could 
be defined for many logics. The OBJ language, one of the earliest and most influ- 
ential executable algebraic specification languages, also incorporated the Clear 
ideas. Categorically based module composition operations had an enormous in- 
fluence not only in formal specification, but also in software methodology: his 
parameterized programming methodology predates by about two decades more 
recent work on generic programming. These ideas, and many others that he has 
pioneered, reverberate through the pages of this volume, in which entire chapters 
are devoted to some of them. Furthermore, there are several regular scientific 
meetings of an international scope, including the CALCO and AMAST confer- 
ences and the WADT Workshop, dedicated to ideas either initiated or directly 
influenced by Joseph Goguen. There are also a number of important languages 
that have been influenced by his CLEAR and OBJ algebraic specification lan- 
guages, including: ACT1, ML, CASL, Maude, CafeOBJ, and ELAN. 

A common thread in his work is the use of abstract algebra, particularly of 
categorical algebra, to get at the core of each problem and formulate concepts 
in the most general and useful way possible. Algebraic and logical methods are 
then deployed to provide a rigorous account of meaning, both in computational 
systems and in semiotic systems. Furthermore, in areas in which social aspects 
are involved, a humanistic perspective is combined with mathematical and com- 
putational perspectives to do justice in a non-reductionist and critical way to a 
wide range of human phenomena, including phenomena arising from the use or 
misuse of computer systems in concrete social situations. 

This Festschrift volume, published to honor Joseph Goguen on his 65th birth- 
day, includes refereed papers by leading researchers in the different areas spanned 
by Joseph Goguen’s work. These papers were presented at a symposium in San 
Diego, California, June 27-29, 2006 to honor Joseph Goguen’s 65th birthday on 
June 28, 2006. Both the Festschrift volume and the symposium will allow the 
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articulation of a retrospective and prospective view of a range of related research 
topics by key members of the research community in computer science and other 
fields connected with Joseph Goguen’s work. We think that the papers speak for 
themselves and provide a wonderful overview of Joseph Goguen’s enormously 
influential ideas in one of the best ways possible, namely, by reflecting on how 
they have become and are part of a vast scientific dialogue. 

We feel privileged to edit this volume. For us it is a way of expressing our 
admiration, our gratitude, and our friendship to Joseph Goguen. The four of 
us worked closely together at SRI’s Computer Science Laboratory designing 
and implementing the OBJ2 language during the 1983-4 academic year. The 
scientific enthusiasm, camaraderie, and friendship of that relatively short but 
very influential period have grown over the years and have had a great impact on 
our lives. We are most grateful to all the authors who responded enthusiastically 
to our project and have contributed an excellent collection of papers for this 
volume. We are also very thankful to all those, both authors and nonauthors, 
who have helped us in the refereeing process to achieve a well-finished scholarly 
volume, and to Alfred Hofmann at Springer who has encouraged our project from 
its early stages and has provided valuable advice. Keith Marzullo and Briana 
Ronhaar at UCSD deserve very special thanks as, respectively, Local Chair of the 
Symposium and Main Local Coordinator. Funding from the US Office of Naval 
Research to partially support both this Festschrift volume and the symposium 
through ONR. Grant N00014-06-1-0280 is also gratefully acknowledged. We are 
particularly grateful to Ralph Wachter at ONR, who early on encouraged our 
project for the Festschrift volume and the symposium. Last but not least, we 
warmly thank Joseph Hendrix at UIUC for his invaluable and untiring help in 
preparing this volume. 


April 2006 Kokichi Futatsugi 
Jean-Pierre Jouannaud 
José Meseguer 
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Abstract. This essay draws on participant observation, ethnographic interviews, 
phenomenological inquiry, and recent insights from the study of swarm 
intelligence and complex networks to illuminate the dynamics of collective 
musical improvisation. Throughout, it argues for a systems understanding of 
creativity—a view that takes seriously the notion that group creativity is not 
simply reducible to individual psychological processes—and it explores 
interconnections between the realm of musical performance, community 
activities, and pedagogical practices. Lastly, it offers some reflections on the 
ontology of art and on the role that music plays in human cognition and 
evolution, concluding that improvising music together allows participants and 
listeners to explore complex and emergent forms of social order. 


1 Introduction 


The nature of creativity in the arts and sciences has been of a topic of enduring human 
interest. But the dominant scholarly approach to the subject, until recently, has 
proceeded from the assumption that creativity is primarily an individual psychological 
process, and that the best way to investigate it is through the thoughts, emotions, and 
motivations of those individuals who are already thought to be gifted or innovative. 
In the past several decades, however, researchers have begun to focus more attention 
on the historical and social factors that shape and define creativity, and on its role in 
everyday activities and learning situations.’ Yet despite this shift in the field towards 
a systems perspective, the notion that creativity operates primarily on the level of 
individuals (albeit now situated within a rich and complex environment), or that 





' This shift is attributed in great part to the work of Mihaly Csikszentmihalyi [1], who has 
argued for a systems view of creativity. The work of sociologist Howard Becker has also been 
influential in this regard, as well as foundational work in sociology of knowledge (Mannheim), 
activity theory (Vygotsky), communities of practice (Lave and Wenger), ethnomethodology 
(Garfinkel), and ecological psychology (Gibson). 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 1-24, 2006. 
© Springer-Verlag Berlin Heidelberg 2006 
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creativity necessarily results in a creative product, has proved to be remarkably 
resilient. 

The practice of improvising music together calls into question many of these 
assumptions. The activity is both intrinsically collaborative and inherently 
ephemeral. Since roughly the middle of last century, an eclectic group of artists with 
diverse backgrounds in contemporary jazz and classical music—and increasingly in 
electronic, popular, and world music traditions as well—have pioneered an approach to 
improvisation that borrows freely from a panoply of musical styles and traditions and 
at times seems unencumbered by any overt idiomatic constraints. This musical 
approach, often dubbed “free improvisation,” tends to devalue the two dimensions 
that have traditionally dominated music representation—quantized pitch and metered 
durations—in favor of the micro-subtleties of timbral and temporal modification and 
the surprising and emergent properties of collective creativity in the moment of 
performance.” 

In the community of free improvisers it is not uncommon for musicians to speak of 
the importance of developing a “group mind” during performance. This requires, at 
the very least, cultivating a sense of trust or empathy among group members, and, 
according to some, it may also involve reaching a certain egoless state in which the 
actions of individuals and the group perfectly harmonize. Percussionist Adam 
Rudolph described his trio’s approach to me this way: “We all participate in creating 
the musical statement of the moment. In the process of being free as a collective, you 
have to have selflessness to give yourself to the musical moment and not come from a 
place of ego.’”” 

In the moment-to-moment dynamics of improvised performance it can also be 
difficult to separate individual contributions and intentions from those cultivated by 
the “group mind.” Bassist Richard Davis explains: “Sometimes you might put an idea 
in that you think is good and nobody takes to it... And then sometimes you might put 
an idea in that your incentive or motivation is not to influence but it does influence.”* 
Acknowledging this inherent complexity, saxophonist Evan Parker finds that: 


However much you try, in a group situation what comes out is group music and 
some of what comes out was not your idea, but your response to somebody else’s 
idea... The mechanism of what is provocation and what is response—the music is 
based on such fast interplay, such fast reactions that it is arbitrary to say, "Did you 
do that because I did that? Or did I do that because you did that?" And anyway the 
whole thing seems to be operating at a level that involves...certainly intuition, and 
maybe faculties of a more paranormal nature.” 


Research on creativity has tended to make a distinction between an ideation stage, 
in which the non-conscious brain produces novelty through divergent thinking, and an 
evaluation stage, in which the conscious mind decides which new ideas are coherent 





? For two useful starting points on the web, covering principally the US and European scenes 
respectively, see www.restructures.net and www.shef.ac.uk/misc/rec/ps/efi/. See also Bailey 
[2]. 

* Quoted in [7], p. 80. 

* Quoted in [6], p. 88. 

> Quoted in [8], p. 203. 
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with the creative domain. From a systems perspective, however, ideation and 
evaluation may occur in individuals in a complex rather than a linear fashion, and 
during ensemble performances they may become externalized into a group process. 
Keith Sawyer [3], in his recent book titled Group Creativity, expands Mihaly 
Csikszentmihalyi’s [4] well-known notion of “flow’—in which the skills of an 
individual are perfectly matched to the challenges of a task, and during which action 
and awareness become phenomenologically fused—to include the process of entire 
groups performing at their peak.° Group flow, according to Sawyer, can inspire 
individuals to play things that they would not have been able to play alone or would 
not have explored without the inspiration of the group. Yet as a collective and 
emergent property, group flow can be extremely difficult to study empirically. 
Sawyer describes it as an irreducible property of performing groups that cannot be 
reduced to psychological studies of the mental states or the subjective experiences of 
the individual members of the group. 

Models that focus on the creativity of individuals are not wrong, but like 
Newtonian science, they may be inappropriate for trying to make sense of certain 
types of phenomena. What we need are new models operating at a different level. In 
the increasingly complex and interconnected world that we inhabit it is becoming 
apparent that structure and organization can emerge both without lead and even 
without seed. What happens and how it happens depends on the nature of the 
network. 

What implications do the study of group musical performance and the study of 
complex network dynamics have for musical scholarship and more broadly for our 
understandings of human creativity? In music, networks organize not only the social 
world of performance (with whom you play) but also the ideascapes of creativity (by 
whom you are influenced and what or how you chose to create) and the dynamics of 
communities (how historical, cultural, and economic factors often dictate which 
musicians and musical ideas gain notice and prestige). Networks make 
communication and community possible, but they can also concentrate power and 
opportunities in the hands of a few. In this essay I explore the dynamics of group 
musical improvisation and recent insights from the study of swarm intelligence and 
complex networks in order to investigate some ways in which musical studies might 
productively grapple with the complex of factors that establish, maintain, expand, and 
even destroy musical communities. 


2 Insect Music 


“At one level, improvisation can be compared with the ultimate otherness of an ant 
colony or hive of bees. Perhaps it was no coincidence that in the wake of drummer 
John Stevens and the Spontaneous Music Ensemble, certain strands of English 
improvised music were known, half-disparagingly as insect music. 

David Toop [9], p. 247 


° Sawyer draws heavily on ethnographic work by Paul Berliner [5] and Ingrid Monson [6] for 
his perspective on jazz and improvisation. 
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Improvisation is not a revolution that pits itself against codification; it is diffuse. 
Like ants stripping a carcass, it works from the inside and outside of codes. 
John Corbett [10], p. 237 


In Euro-American art-music culture this binary [between composition and 
improvisation] is routinely and simplistically framed as involving the “effortless 
spontaneity” of improvisation, versus the careful deliberation of composition—the 
composer as ant, the improviser as grasshopper. 

George Lewis [11], p. 38 


Scientists, artists, and laypeople alike have for centuries watched in wonder as a 
flock of birds spontaneously takes flight and navigates in perfect harmony, or as a 
hive of bees throws off a collective swarm into the air. At the dawn of the twentieth 
century, the Belgian poet Maurice Maeterlinck wondered, “Where is ‘this spirit of the 
hive’...where does it reside? What is it that governs here, that issues orders, foresees 
the future?”’ We now know that within the swarm a half dozen or so anonymous 
workers scout ahead to check for possible hive locations. When they report back to 
the swarm, they perform an informative dance, the intensity of which corresponds to 
the desirability of the site they scouted. Deputy bees follow up on the more 
promising reports and return to either confirm or disconfirm the desirability of the 
new location. Although it is rare for a single bee to visit more than one potential site, 
through the process of compounding emphasis, the more desirable sites end up getting 
the most visitors. In other words, the hive chooses: the biggest crowd eventually 
provokes the entire swarm to dance off to its new location. 

We can sense in this and other examples of complex and decentralized decision- 
making certain qualities that appear to inform all life. William Morton Wheeler, the 
founder of the field of social insects, argued as early as 1911 that an insect colony 
operates as a type of superorganism: “Like a cell or the person, it behaves as a unitary 
whole, maintaining its identity in space, resisting dissolution...neither a thing nor a 
concept, but a continual flux or process.” Even the sound of the swarm can fascinate 
human ears. For her aptly titled “Bee Project,” kotoist and multimedia artist Miya 
Masaoka’s positioned a glass-enclosed bee hive of 3,000 bees in the center of the 
stage and amplified, manipulated, and blended its sounds with those from a trio of 
improvisers, all according to the instructions in her score. Later versions of the same 
work have used spatialization software to twist and tilt the sound of the hive so that 
listeners can be sonically located within the swarm. 

As the three quotes offered at the beginning of this section illustrate, there are 
several ways in which we might wish to locate musical connections to the swarm. 
Some improvised music provokes such quick reactions from players and evokes such 
complicated and dense soundscapes for listeners that a literal analogy to a swarm of 
insects may seem rather appropriate. And the ways in which individual improvisers 
can be heard to be “picking at” a shared body of modern techniques and sensibilities 
but in resolutely individualistic ways, or to be following their own creative spark 
while also being sensitive to and dependent on the evolving group dynamic, may 


7 Quoted in [12], p. 7. Maeterlink’s book is available online at 
http://www.eldritchpress.org/mm/b.htmli#toc. 
8 Quoted in [12], p. 7. 
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bring to mind the behavior of social insects that seem to have their own agenda while 
also working in ways that organize the group without supervision. Finally, the notion 
of “insect music” has perhaps become most associated with a type of generative 
compositional scheme, and often with the power of computers to create complex 
patterns from relatively simple materials, such that questions about the ways in which 
creativity may be facilitated or constrained and the ways in which cultural 
understandings may be reflected, reshaped, or remain concealed in this type of work 
become particularly important. 

In addition to being an extremely skilled improviser, the English drummer John 
Stevens will always be remembered for his instrumental role in developing the scene 
at The Little Theater Club in London that nurtured many in the first generation of 
English free improvisers. One of his early pedagogical approaches was titled Click 
Piece, and it included little more that the instruction to play the shortest sounds on 
your instrument.’ In the collective setting, however, one would gradually become 
aware of an emergent group sound. As David Toop [9] explains, “The piece seemed 
to develop with a mind of its own and almost as a by-product, the basic lessons of 
improvisation—how to listen and how to respond—could be learned through a careful 
enactment of the instructions” (pp. 242-3). Steven’s Click Piece highlights one of the 
central aspects of swarm dynamics; relatively simple decentralized activities can 
produce dramatic, self-organizing behaviors. 

In the scientific community, a growing number of researchers are exploring new 
ways of applying swarm intelligence (or SI) to diverse situations.” For instance, the 
foraging of ants has led to improved methods for routing telecommunications traffic 
in a busy network. The way in which insects cluster their dead can aid in analyzing 
bank data. The distributed and cooperative approach used by many social insects to 
transport goods and to solve navigational problems has led to new insights in the 
fields of robotics and artificial intelligence. And the evolving division of labor in 
honeybees has helped to improve the organization of factory assembly line workers 
and equipment. As Eric Bonabeau and Guy Théraulaz [15] see it: “The potential of 
swarm intelligence is enormous. It offers an alternative way of designing systems 
that have traditionally required centralized control and extensive preprogramming” 
(p.79). 

Beyond these business and technological applications, however, one of the main 
lessons of contemplating SI is that organized behaviors can develop in decentralized 
ways. Can exploring and thinking about SI affect the way we make and think about 
music? It remains difficult for many people to envision complex systems organizing 
without a leader since we are often predisposed to think in terms of central control 
and hierarchical command. The notion that music can be organized in complex ways 
without a composer or conductor still leaves many scratching their heads in doubt. 
Scientists have also been predisposed in the past to look for chains of command, 
instances of clear cause and effect. But the emerging field of SI demonstrates that 
complex behaviors and efficient solutions can be arrived at without a leader, 
organized without an organizer, coordinated without a coordinator. 





? Stevens titled the reverse strategy “Sustained Piece.” 

10 Although this field is often presented as evolving in only the past few years, examples drawn 
from the world of social insects can be found in early cybernetics theory [13], pp. 156-7 and in 
dissipative structures as well [14], pp. 181-6. 
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The secret of the swarm lies in the intercommunication of its members. Through 
direct and indirect interactions among autonomous agents and between agents and 
their environment, swarm systems are able to self-organize in decentralized, robust, 
and flexible ways. Bonabeau, Théraulaz, and Marco Dorigo [16], a physicist, 
biologist, and engineer working together at the Santa Fe Institute, offer a list of four 
basic ingredients that through their interplay can manifest in swarm intelligence: 1) 
forms of positive feedback, 2) forms of negative feedback, 3) a degree of randomness 
or error, and finally 4) multiple interactions of multiple entities. 

Positive feedback in SI can be usefully summarized as simple “rules of thumb” that 
promote the creation of structures: activities such as recruitment and reinforcement. 
Negative feedback counterbalances positive feedback and helps to stabilize the 
system: it may take the form of saturation, exhaustion, or competition. A certain 
degree of randomness or error is also crucial, since it enables the discovery of new 
solutions and produces fluctuations that can act as seeds from which new structures 
develop. Finally, SI generally requires a minimum density of mutually tolerant 
individuals, since individuals should be able to make use of the results of their own 
activities and the activities of others. 

While something of a general and descriptive list, these ingredients do play 
important roles in collective improvisation. Through positive feedback musicians not 
only develop their own ideas from a kernel of inspiration, but they also work together 
to support the ideas of others and the evolving ensemble sound. They “recruit” others 
to support or sustain their own developments, or they may choose to “reinforce” the 
creative direction of others instead. Similar to the ways in which information about 
the best food source or the shortest path can be compounded among a swarm of bees 
or a colony of ants, positive feedback increases the ability of an improvising group to 
follow the more “promising” of many concurrent ideas being pursued by various 
members. 

Negative feedback in improvisation helps to keep things interesting. By 
intentionally looking elsewhere for new ideas or new musical areas to explore, 
individuals can either signal transitions away from ensemble moments that have 
lingered too long or seem to be going nowhere (the feelings of saturation and 
exhaustion), or they can productively layer divergent sonic qualities and musical ideas 
together or provoke others to boost their own creativity (through a competitive 
element). Negative feedback helps to maintain a balance in the evolving 
improvisation so that one idea does not continue to amplify indefinitely (although a 
more static approach can produce interesting results as well). 

Unexpected occurrences, in the form of randomness or error, often provide both 
source material and inspiration for individuals and groups to explore new sonic 
territory, musical techniques, and interactive strategies. Noticing and capitalizing on 
unexpected fluctuations as an improvisation unfolds can produce important structural 
cues, developments, and transitions, and it represents a particular joy of improvised 
music making in general. Without this third ingredient, groups of improvisers who 
work together over a longer period of time might become too familiar with one 
another’s musical language and approach or might fall into regular strategies of 
support and counterbalance (and this of course does happen). 

Finally, the notion that individuals and the group as a whole benefit from multiple 
interactions and perspectives is something of an axiom in ensemble forms of 
improvisation and in the community of improvisers. One of the particular challenges 
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of contemporary improvisation, for both players and listeners, is to remain aware of 
and sensitive to the many musical gestures and processes circulating between 
members of the group in the moment of performance and between members of the 
community as ideas circulate via recordings, impromptu meetings, and the 
overlapping personnel of various working groups. 

In much freer improvisation, the collective pattern of the group is more important 
than any of the individual actions heard in isolation. But this does not deny freedom 
to individual musicians. Saxophonist Evan Parker [17] highlights the ways in which 
freedom works within the collective unfolding of what might easily be termed swarm 
dynamics: 


The freedom is of course that since you and your response are part of the context 
for other people, and they have that function for you, it's very hard to unravel the 
knots of why anybody is doing what they do in a given context. I think it's pretty 
clear that you could sort of go with the flow, or you could go against the flow. 
And sometimes what the music really needs is for you to go with the flow, and 
sometimes what it really needs is for you to do something different. Or anybody, 
somebody, to do something different. So that's why people improvise, presumably, 
because they want the freedom to behave in accordance with their response to the 
situations. But since their response then becomes part of the new situation for the 
other players, it's very hard to say why a particular sequence of events unfolds in 
the way it does. But we get used to following the narrative of improvisational 
discourse... 


Parker’s notion that “the music” needs for things to happen, needs for musicians to 
do things, is a fairly common way in which improvisers speak about the process of 
performance. In his liner notes to the album Jn Order to Survive, bassist William 
Parker (no relation) expresses that, “Creative Music is any music that procreates itself 
as it is being played to ignite into a living entity that is bigger than the composer and 
player.” While these comments certainly resonate with the notion of a 
superorganism touched on earlier, they may also highlight an additional dimension of 
SI research: interactions within a swarm can be both direct and indirect. The direct 
interactions are the obvious ones: with ants this can involve antennation or 
mandibular contact, food or liquid exchange, chemical contact, etc. But indirect 
interactions are more subtle. In SI they are referred to by the rather cumbersome term 
stigmergy (from the Greek stigma: sting, and ergon: work). Stigmergy describes the 
indirect interaction between individuals when one of them modifies the environment 
and the other responds to the new environment rather than directly to the actions of 
the first individual. This helps to describe the process of “incremental construction” 
that many social insects use to build extremely complex structures or to arrange items 
in ways that might at first seem arbitrary or random. And because positive feedback 
can produce nonlinear effects, indirect interaction can result in dramatic bifurcations 
when a critical point is reached: for example, some species of termites alternate 





1! Here we might also want to envision the creative process of each individual as a type of 
swarm dynamic, as the processes of ideation and evaluation can work rapidly and in complex 
and nonlinear ways. 

'2 Black Saint records 12015902 (1995). 
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between non-coordinated and coordinated building to produce neatly arranged pillars 
or strips of soil pellets. 

But swarm intelligence has its limits and its drawbacks. Social insects can adapt to 
changes in their environment, but only within a certain degree of tolerance. For 
instance, many social insects are able to seek out and find new food sources when an 
existing one is exhausted, or some species are able to reallocate labor roles if the 
number of required workers for a specific task dwindles, all without explicit 
instruction. But the “army ant syndrome” offers a compelling example of the limits to 
this adaptability and of swarm intelligence in general. Among army ants, when a 
group of foragers accidentally gets separated from the main colony, the separated 
workers run in a densely packed “circular mill” until they all eventually die from 
exhaustion. Although able to function well within the group under normal 
circumstances, an unpredictable perturbation of a large enough degree can destroy the 
colony’s cohesiveness and make it impossible for the group to recover. 

For a musical analogy, while sensitivity to the group is an essential component of 
improvised performance, to blindly base one’s own playing on what others do or to 
simply follow the group as an overriding strategy can lead to rather inflexible and 
ineffective results, producing a musical “circular mill.” And many improvisers, if 
they sense that all of the participants are following each other too carefully, will “go 
against the grain” or “forge out on their own” into new sonic territory; in other words, 
they will defy the logic of the hive mind. To return briefly to our earlier example of 
John Stevens’s Click Piece, although this generative approach to collective 
improvisation offered an effective way to make “quite ravishing” music with a large 
ensemble comprising players of mixed ability and experience, to more skillful and 
confident musicians it quickly became an unproductive limitation. Simplifying the 
parameters for improvisation can be useful and even necessary for making large 
ensembles swarm effectively, but in the more intimate setting of a small group, 
arguably the preferred arrangement for the majority of free improvisation enthusiasts, 
a less restrictive framework is usually desired. 

The cohesion of small groups can also be jeopardized by imbalances that lead to 
polarization. Drawing on research with decision-making among corporate boards and 
committees, James Surowiecki [18] identifies a few qualities that appear to factor into 
all intimate social settings: earlier comments are more influential; higher status people 
talk more and more often; and status is not always derived from 
knowledge/experience. Since constantly making comparisons and adjustments to 
others can result in an unproductive “group think,” it is important for individuals to 
champion their own ideas in small group settings. But too much vehemence in this 
can lead to a completely polarized setting or to an “information cascade” when others 
are subsumed by a singular view or opinion. In short, deference to the ideas of others 
is important, but so is dissent when required. 

Without a doubt there are important differences in the degrees of freedom allowed 
in a swarm of bees and in the sonic swarm of collective improvisation. But if 
interesting complexities can emerge from groupings of individuals with a limited 
array of communication possibilities, how much more can we expect from 
experienced and creative artists? J. Stephen Lansing [19], an anthropologist who also 
serves as external faculty at the Santa Fe Institute, wonders about complex adaptive 
systems in general: “What if the elements are not cells or light bulbs but agents 
capable of reacting with new strategies or foresight to the patterns they have helped to 
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create?” (p. 194). Much of the current research by social scientists on complex 
adaptive systems is concerned with precisely this question. 

The field of SI is still very much in its infancy. It is often extremely difficult for 
researchers to understand the inner workings of insect swarms and the variety of rules 
by which individuals in a swarm interact. Even in those cases when we can 
understand the behaviors of individuals, we may still be unable to predict or 
understand the dynamics of the overall system since countless other environmental 
factors come into play. When transposed into the realm of humans, these 
uncertainties only compound themselves. Discussing the business and technological 
applications of SI, Bonabeau and Théraulaz [15] confess that: “Although swarm- 
intelligence approaches have been effective at performing a number of optimization 
and control tasks, the systems developed have been inherently reactive and lack the 
necessary overview to solve problems that require in-depth reasoning techniques” 
(p.79). We still don’t know enough about social insects, little less social humans, to 
be able to understand how certain group behaviors emerge and evolve. 

Nevertheless, the notion that a group can have capacities and capabilities that 
extend beyond the scope of any of its participating members is a powerful one. In a 
provocative chapter titled “Hive Mind” from his book Out of Control, Kevin Kelly 
[12] points out that the hive does possess much that none of its parts possesses. Not 
only does swarm intelligence represent a type of distributed perception for the hive, 
but the hive also possesses a type of distributed memory; the average honeybee 
operates with a memory of six days, but the hive as a whole operates with a 
distributed memory of up to three months, twice as long as the lifetime of the average 
bee. Bonabeau et al. [16] write: 


We suggest that the social insect metaphor may go beyond superficial 
considerations. At a time when the world is becoming so complex that no single 
human being can really understand it, when information (and not the lack of it) is 
threatening our lives, when software systems become so intractable that they can 
no longer be controlled, perhaps the scientific and engineering world will be more 
willing to consider another way of designing “intelligent” systems where 
autonomy, emergence and distributed functioning replace control, prepro- 
gramming, and centralization (p.22). 


We might also hope that the music world will continue to explore ways of 
organizing sonic and social experiences that do not hinge on centralized notions of 
control. Well aware of these concerns, trombonist/composer/scholar George Lewis 
[20] writes in a recent essay reflecting on improvisation and the orchestra: 


Orchestra performers operate as part of a network comprised not only of 
musicians, composers and conductors, but also administrators, foundations, critics 
and the media, historians, educational institutions, and much more. Each of the 
nodes within this network, not just those directly making music, would need to 
become “improvisation-aware,” as part of a process of resocialization and 
economic restructuring that could help bring about the transformation of the 
orchestra that so many have envisioned. 
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3 A Web Without a Spider 


If group improvisation may be heard in its best moments to demonstrate complex and 
emergent properties that are somehow greater than the sum of its parts, then 
investigating individuals and ensembles in isolation of the network of surrounding 
influences will not suffice. And as we move our gaze further into the social and 
historical realms, the notion that any one individual is controlling their own web of 
musical sounds and meanings becomes rather untenable. We need to reorient our 
analytical framework to take account of the dynamics that occur in ensembles as they 
perform together over days, weeks, months, and even years. And we need to 
acknowledge the ways in which influences in musical communities circulate through 
more than the sounds of performances and recordings; meaning is everywhere, not 
simply in the “sounds themselves.” The networks involved include a host of social 
conventions and material artifacts that affect the ways in which music is made and 
heard: from the funding sources or media attention that a performer may receive to 
the casual conversations or critical reviews that a performance may provoke. While it 
may be fairly common to acknowledge the subtle influence that specific audiences 
and venues can have on performance, especially in relation to improvisation, the 
network of material, economic, technological, educational, and social factors at play, 
and the complex meanings that they generate through their interactions, are far more 
involved than that. In fascinating ways, this network-style organization both shapes 
and is shaped by the activity of all of its participants; everyone changes the state of 
everyone else. Although the spontaneous and surprising occurrences in improvised 
performance can attract our immediate attention, it is through the dynamic interplay 
of social, material, and sonic culture that we begin to sense the true lifeblood of the 
music. 

Although networks have interested researchers for decades, until recently, each 
system tended to be treated in isolation, with little apparent reason or possible means 
to see if its organizational dynamics had anything in common with other networks. 
We are only now beginning to piece together some important qualities of, and 
approaches to, the study of complex dynamic networks on a broad scale. But Albert- 
László Barabasi [21], one of the leading researchers in this still nascent field, 
optimistically predicts: “Network thinking is poised to invade all domains of human 
activity and most fields of human inquiry. It is more than another helpful perspective 
or tool. Networks are by their very nature the fabric of most complex systems, and 
nodes and links deeply infuse all strategies aimed at approaching our interlocked 
universe” (p. 222). 

The notion of networks may bring to mind rather bare-boned models of how things 
are connected. To some extent this is true, since simplifying detail on one level of a 
network can highlight organizational similarities on another that would otherwise go 
unnoticed. Network models, however, are increasingly able to take account of some 
of the rich dynamics that occur when individual components are not only doing 
something—generating power, sending data, even making decisions—but also are 
affecting one another over time. Steven Shaviro [22] writes in his book Connected, 
Or What it Means to Live in the Network Society: 
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As it seems to us now, a network is a self-generating, self-organizing, self- 
sustaining system. It works through multiple feedback loops. These loops allow 
the system to monitor and modulate its own performance continually and thereby 
maintain a state of homeostatic equilibrium. At the same time, feedback loops 
induce effects of interference, amplifications, and resonance. And such effects 
permit the system to grow, both in size and in complexity. Beyond this, a network 
is always nested in a hierarchy. From the inside, it seems to be entirely self- 
contained, but from the outside, it turns out to be part of a still larger network (p. 
10). 


Music, as an inherently social practice, thrives on network organization. On 
perhaps the most tangible level, a musician’s livelihood and creative opportunities 
frequently depend on the breadth and depth of one’s network of social and 
professional contacts. But network dynamics shape the sounds, practices, and 
communities of music in decidedly more complex and subtle ways as well. 
Musicians are influenced by their years of training or apprenticeship, countless hours 
spent listening to music both publicly and privately, and perhaps most 
comprehensively (yet frequently least acknowledged) by the historical and cultural 
conventions of a given time and locale. The topics and techniques of music education 
also depend on these network-style dynamics, which inform the process of choosing 
canons and of exploring and imparting the intricacies of musical theory and musical 
aesthetics. Finally the music industry’s far-reaching networks of production and 
distribution, and increasingly its consolidated and insular organizational practices, 
have the power to structure, at some degree or another, the networks of inspiration 
and possibility for nearly everyone who is deeply committed to music. 

Yet music researchers have in the past focused the lion’s share of attention on the 
creative work of individuals, often treating their “work” as a collection of static 
objects (e.g., scores or recordings) to be dissected and categorized. It is not 
uncommon to hear graduate students in musicology programs lamenting (or coming 
to terms with) the fact that they must find an increasingly obscure composer or 
performer on whose work to focus their “comprehensive” scholarly lens. There has, 
of course, been a pronounced and welcome shift in the past few decades towards a 
“new musicology” that takes into account the historical and cultural factors that 
influence not only the original production of a musical “work,” but also its variable 
reception, taking particular notice of gender and racial constructions that may affect 
both of these. And there has been a marked increase in the number of scholars 
interested in expanding the scope of musical investigation into popular and non- 
Western topics as the fields of ethnomusicology and popular music studies have come 
into their own. But on the whole, music scholarship is only now beginning to focus 
attention on the organizational complexities of music rather than treat it as the 
provenance of a few gifted and prolific individuals. 

The musical community has a vested interest in understanding network dynamics, 
although individuals may vary considerably in their specific expectations. Network 
thinking can shed light on the cultural power inequities that produce imbalances in 
social and economic interactions. It may also tell us much about the spread of ideas 
in musical communities and marketplaces under diverse historical and cultural 


23 For examples, see the work of Susan Mclary and Suzanne Cusick among others. 
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conditions. Creative musicians may hope to find in network dynamics glimpses of 
future directions for innovation or influence, strategies for how to avoid or disrupt 
network hubs and established practices in hopes of alternative community 
reorganization, or the means by which they might increase their own professional 
contacts and opportunities. 

Actor-Network Theory (ANT), a sociological approach that has emerged out of 
science and technology studies, is geared towards embodying this very tension 
between the centered ‘actor’ on the one hand and the decentered ‘network’ on the 
other. As John Law [23], one of the field’s leading researchers, remarks: “In one 
sense the word [actor network theory] is thus a way of performing both an elision and 
a difference between what Anglophones distinguish by calling ‘agency’ and 
‘structure’” (p.5).'* In short, ANT does not accept the notion that there is a 
macrosocial system on the one hand, and bits and pieces of derivative microsocial 
detail on the other. According to Law: 


If we do this we close off most of the interesting questions about the origins of 
power and organization. Instead we should start with a clean slate. For instance, we 
might start with interaction and assume that interaction is all that there is. Then we 
might ask how some kinds of interactions more or less succeed in stabilising and 
reproducing themselves: how it is that they overcome resistance and seem to 
become "macrosocial"; how it is that they seem to generate the effects such as 
power, fame, size, scope or organisation with which we are all familiar. This, then, 
is the one of the core assumptions of actor-network theory: that Napoleons are no 
different in kind to small-time hustlers, and IBMs to whelk-stalls. And if they are 
larger, then we should be studying how this comes about—-how, in other words, 
size, power or organisation are generated. 


As musical traditions expand in scope and popularity, better-connected “hubs” tend 
to emerge. In jazz, for example, the "hubs" of Louis Armstrong, Duke Ellington, 
Charlie Parker, Miles Davis, and John Coltrane, among others, are impossible to 
ignore. During their lifetimes these musicians were well respected and well 
connected (although not always early in their careers and not by everyone) and their 
influence has only grown since. With the spread of institutionalized jazz education 
and the increasing reliance of major labels on re-releasing canonical jazz recordings, 
the visibility and "connectedness" of these hubs may only continue to grow. For 
instance, in the last few years Columbia, Atlantic, and Verve have all drastically 
reduced their roster of living artists in favor of re-releasing older material. Even the 
Marsalises, perhaps the most visible jazz performers today, no longer have a major 
record deal. David Hajdu [24] perceptively writes in an Atlantic Monthly spread on 
Wynton: "Where the young lions saw role models and their critics saw idolatry, the 
record companies saw brand names-the ultimate prize of American marketing. For 
long established record companies with a vast archive of historic recordings, the 
economies were irresistible: it is far more profitable to wrap new covers around 
albums paid for generations ago than it is to find, record, and promote new artists" (p. 
54). 


14 For other important work in ANT see the publications of Geoffrey C. Bowker and Susan 
Leigh Star. 
15 http://www.comp.lancs.ac.uk/sociology/soc054jl.html. 
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For an artistic tradition to remain dynamic and healthy the network dynamics that 
take note of history and provide hubs for a common language and style should not 
become too powerful. If the disparity between the hubs and the remainder becomes 
too great, there may be a “tipping point” beyond which communication and 
innovation in a tradition can suffer dramatically.'° In the same Atlantic Monthly 
article, Jeff Levinson, the former Columbia Jazz executive, is quoted as saying: "The 
Frankenstein monster has turned on its creators. In paying homage to the greats, 
Wynton and his peers have gotten supplanted by them in the minds of the populace. 
They've gotten supplanted by dead people" (p. 54).'’ The disparity of attention in 
music seems to be regulated through the process of interaction. This can come in the 
direct form of collaboration between artists, but also in the indirect form of media 
attention, record sales, performance opportunities, and arts funding or sponsorship. 

In what is perhaps its most radical move, ANT attempts to take account of the 
heterogeneous networks that include not only social or human dimensions, but also 
the material dimensions that make human and social behaviors possible. ANT 
explores how these heterogeneous networks come to be patterned to generate effects 
like organizations, inequality, and power. Joseph Goguen explains: 


Actor-Network theory can be seen as a systematic way to bring out the 
infrastructure that is usually left out of the “heroic” accounts of scientific and 
technological achievements. Newton did not really act alone in creating the theory 
of gravitation: he needed observational data from the Astronomer Royal, John 
Flamsteed, he needed publication support from the Royal Society and its members 
(most especially Edmund Halley), he needed the geometry of Euclid, the 
astronomy of Kepler, the mathematics of Galileo, the rooms, lab, food, etc. at 
Trinity College, an assistant to work in the lab, the mystical idea of action at a 
distance, and more, much more. "° 


The goals of network theory are gradually shifting from describing the topology of 
systems to understanding the mechanisms that shape network evolution. Barabasi [21] 
acknowledges that, “We must move beyond structure and topology and start focusing 
on the dynamics that take place along the links. Networks are only the skeleton of 
complexity, the highways for the various processes that make our world hum. To 
describe society we must dress the links of the social network with actual dynamical 
interactions between people” (p. 225). 

As in a house of mirrors, the science of networks has seemingly led us to a place in 
which all of the details matter and, to some extent, none of them do. Since at least the 
work of Emile Durkheim we have known that large-scale social phenomenon-the 
predictable number of Parisians who commit suicide every year—can be independent 
of the particulars—which Parisians are actually led to kill themselves and why. And 





16 For a popular science treatment of the notion of a “tipping point” see Gladwell [25 ]. 

17 For a recent example of how powerful hubs have become in jazz, the San Francisco Jazz 
Spring 2005 series of concerts featured no less than seven tributes to the music of John 
Coltrane within a month’s time, including versions of his music from the albums A Love 
Supreme, Ascension, Africa Brass, Crescent, and Interstellar Space. There was also a concert 
by the Mingus Big Band and a tribute to the music of Rashaan Roland Kirk as well. 

1g http://carbon.cudenver.edu/~mryder/itc_data/ant_dff.html. 
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despite the enormous complexities of the Isaac Newton example described above, 
scientists in the modern era glean what they need to from Newton, usually without 
reading his original work, and they move on to more pressing concerns. 

Yet the details and vagaries of a network system do seem to matter enormously. 
Although network theory often focuses on large-scale behaviors, these large-scale 
behaviors are fundamentally provoked by the ability of one individual to influence 
another and the notion that people can change their strategies depending on what 
other people are doing. Through these dynamics alone, systems can self-organize in 
remarkably complex ways. 

In music, the practice of free improvisation is perhaps closest to this ideal of a self- 
organizing system. Its bottom-up style emphasizes possibilities for adaptation and 
emergence; it accentuates creativity-in-time and the dynamics of internal change. The 
structures of improvisation can also continue to be extended in boundless ways 
(although the system may be circumscribed, at least in part, by the abilities, materials, 
and experiences of those who are participating). From one perspective, improvised 
music is resilient to individual “mistakes” since sounds can be re-contextualized after 
the fact by either the original performer or others in the group. And if one musician 
drops out or is unable to make a performance, the system can often continue to 
function without major interruption, perhaps even organizing in ways that are both 
novel and more complex. From another perspective, however, group improvisation 
may be less resilient to personality conflicts or pronounced aesthetic differences 
between individuals. With traditional musical practices that are organized in a 
predominantly hierarchical manner, personality differences can often be managed in 
deference to the group leader, the authority of the musical score, or the 
professionalism of “getting the job done.” Free improvisation ensembles tend to aim 
for a more egalitarian organization that makes them particularly susceptible to the full 
spectrum of both musical and so-called “extra-musical” influences. ° 

Despite its many promising qualities, improvisation is also rarely, if ever, the 
“optimal” means to achieve a specific musical end (although it may in fact be both a 
quicker and easier route to certain types of chaotic dynamics). The internal dynamics 
of an improvising ensemble (particularly larger groupings of musicians) can be slow 
to respond to change, and are, for the most part, beyond the control of any one 
individual. Even when things do appear to work well, it will be impossible to analyze 
the system’s dynamics during or after the fact with absolute precision. As with other 
emergent forms of order, the collective dynamics of improvisation will, by definition, 
always transcend the full awareness of individuals. For these and other reasons, many 
ensembles choose to adopt certain compositional schemes or devices in order to offer 
some additional degrees of control over the situation. There is no guarantee, 
particularly in individual performances, that divergent components will find ways to 
self-organize effectively.” In general, however, the improvising music community 





' For a related discussion see [26]. 

20 Tt is interesting to note that, for a music predicated on what can be a very risky endeavor—to 
improvise collectively in a group setting—accounts of failure can be very difficult to locate in 
both the academic and trade coverage of the music. Similar to mechanical systems, we may 
learn as much or even more by examining occasions on which improvised performance appears 
to falter 
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demonstrates the remarkable ability to absorb the new and the diverse without 
disruption. 

Individual ensembles will often, over time, establish their own sense of identity or 
coherence. The boundary that develops naturally within an ensemble is not 
necessarily one of personal affinity or exclusion, or one of aesthetic mandate, but 
rather one of trust and conviviality. Like the boundary of a storm or the membrane of 
a human cell, this boundary is both permeable and permanent. It defines the identity 
of the system but also allows for the ongoing dynamics of exchange that are necessary 
to maintain its existence. Of course, a certain danger may lurk for both physical and 
musical systems if this boundary becomes either too porous or too impermeable. If 
too much exchange is fostered with outside forces, the identity of a system may be put 
in jeopardy. Likewise, if too little exchange is allowed or encouraged, a system may 
decline either from reduced internal dynamics, or from its inability to continue to 
adapt to the changing dynamics of its environment. 

Network theory tells us that very different things can be connected through 
surprisingly short distances. Small effects can have large causes, while at other times 
large disturbances may be absorbed without much notice. Although the predictive 
power of network theory is still an open question, it may be enough that through these 
perspectives and approaches we can gain a better understanding of the structure of 
connected systems and the way that different sorts of influences propagate through 
them. Duncan Watts [27], another leading voice in the field, reminds us that, 
“Darwin’s theory of natural selection, for instance, doesn’t actually predict anything. 
Nevertheless, it gives us enormous power to make sense of the world we observe, and 
therefore (if we chose) to make intelligent decisions about our place in it” (p. 302). 

Although only limited work has been done on large-scale music networks to date, 
one study that explored the relationships between jazz musicians from 1912 to 1940 
found so-called “small world” properties. By using the Red Hot Jazz Archive 
database on the Internet, Pablo Gleiser and Leon Danon [28] found that, on average, 
only 2.79 steps separated early jazz musicians from one another. Their model also 
captured the clustering of jazz musicians by geography, with New York and Chicago 
as the major hubs, and by race, due to the highly segregated nature of the music 
industry at the time. As in most human networks, a few individuals had very high 
degrees of connectivity. Guitarist Eddie Lang topped their list, with connections to 
415 other musicians, while artists like Jack Teagarden, Joe Venuti, and Louis 
Armstrong were all in the top 10 of most connected musicians. UCSD Professor 
Richard Belew and I are beginning a similar project to study the network dynamics of 
musical communities using discographic information that will take account of more 
contemporary artists as well. 

Through the wonders of modern network technologies we can now connect to the 
farthest reaches of the globe in an instant. And with more than a century of recorded 
music available to us, we can easily engage with sounds that are similarly removed 
from us, both culturally and historically. But in the age of iPods and web surfing we 
also experience the world in increasing isolation at the same time. Yet the 
resoundingly social nature of music, when viewed as performance rather than product, 
offers the possibility for humans to synchronize their ears, brains, and bodies in ways 
that may be unavailable otherwise. And improvised music’s particular penchant for 
the emergent and unexpected may even allow us to explore and expand our own 
homophily parameter—-the sociological tendency of like to associate with like—as 


16 David Borgo 


familiar and less familiar sounds and people join together to find a common ground, 
even if only temporarily.”! 


4 Harnessing Complexity 


How can these practices be nurtured, particularly within the rather serious and sedate 
halls of the music academy? The jazz community has traditionally stressed a type of 
learning that might be called in contemporary discourse embodied, situated, and 
distributed.” Not only have many performers stressed the full integration of aural, 
physical, and intellectual aspects of the music, but the notion that learning and 
development can only occur within a supportive community is seen as paramount. 
The Association for the Advancement of Creative Musicians (AACM) in Chicago and 
the Creative Music Studio (CMS) in Woodstock, NY are two of the better-known 
examples of this pedagogical orientation. In the standard music academy, however, 
the study of musical improvisation has often been shoehorned into the conventional 
curriculum or simply not addressed at all. 

When addressed, institutionalized approaches to teaching musical improvisation 
have tended to stress individual facility through memorization and pre-planning, 
leaving little room for collective experimentation. Jonty Stockdale [29] finds that: 
“{I|mprovisation in jazz studies programmes is infrequently developed through a 
collective process, with a preference for the development of soloing facility through 
the absorption and imitation of pre-existing language, usage, and style. Whilst this is 
regarded as important for the development of a young jazz musician, matters of self- 
expression, individualism, and most importantly experimentation are often left to later 
stages, by which time exploration of free collective playing can appear unnecessary or 
even redundant” (p. 109). 

In his account of group creativity, Keith Sawyer [3] makes an important distinction 
between a problem-solving and a problem-finding approach to art. Artists adopting 
problem-solving techniques begin with a relatively detailed plan and work to 
accomplish it successfully. Those employing a problem-finding approach, by 
contrast, search for interesting problems as the work unfolds in an improvisatory 
manner. Many beginning jazz improvisers are stuck in a problem-solving mode. As 
pianist/composer Anthony Davis expressed to me in a recent interview: “They have 
been taught right and wrong-these are the notes, these are the chords, these are the 
arpeggios that work on a given chord. This chord happens on the 5" bar [in a blues].” 
But through extended listening, practicing, and playing with musicians who are more 
experienced, Davis finds that jazz players can move from a “dependence on 
articulating the form” to “using the form, realizing that [the tune structure] is the 
beginning of something and you have to create something else... They have to do 
more than just keep time, they have to articulate time... They can make melodic 





>| Duncan Watts’s current research shows that the most searchable networks involve 
individuals who are neither too one-dimensional nor too scattered. As long as people have at 
least two dimensions along which they are able to judge their similarity to others, then small 
world networks are possible—-people can still find short paths to remote and unfamiliar areas. 

>? For more on this topic see chapter seven in Borgo [40]. 
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choices that are at least as strong as the melody that was there before.” Even as 
students become more proficient, however, Davis reminds them that, “You have to 
get beyond your mannerisms to really come up with a musical idea as opposed to a 
catalog of what you do.” 

Problem-finding approaches are equally important when improvising in a group, 
since it is often impossible to determine the meaning of an action until other 
performers have responded to it. The particular challenge of group improvisation, 
then, is that each performer may have a rather different interpretation of what is going 
on and where the performance might be going. In other words, intersubjectivity is 
intrinsic to group performances. For Sawyer [3], however, “The key question about 
intersubjectivity in group creativity is not how performers come to share identical 
representations, but rather, how a coherent interaction can proceed even when they do 
not” (p. 9). In part, this is possible because individuals shape a performance on both 
denotative and metapragmatic levels; they simultaneously enact the details of a 
performance and negotiate their interactions together. Even if a singular meaning to 
performance always remains elusive, participants can shape the ways in which their 
various interactions unfold. 

Davis stresses that it is critical that students learn the difference between listening 
and following: “In order to listen, you don’t necessarily follow... You try to construct 
something that coexists or works well with something else—not necessarily this tail- 
wagging-the-dog thing where you just follow someone.” For Davis, “Listening is 
knowing what someone is doing and using it in a constructive way, as opposed to 
mimicry, just trying to demonstrate that you are quote-unquote listening.” The very 
notion that everything could be heard, processed, and immediately responded to 
during complex moments of improvised music is, by itself, far too facile. 
Trombonist/composer/scholar George Lewis [11] describes a type of “multi- 
dominance” in improvised music—an African-American aesthetic by which 
individuals articulate their own perspectives yet remain aware of the group dynamic, 
ensuring that others are able to do so as well. 

Yet exactly how group flow is cultivated in improvised performances can remain 
rather mysterious. Describing his general approach to me, contrabassist Bertram 
Turetzky remarked: “One way when I play free music, I try not to think of anything. I 
respond or I initiate. And whatever my intuitions tell me, I go with them... Other 
times in free music, I play with people perhaps I don’t know. And I say, well, the last 
one started soft and slow and got faster and then went back... So all of a sudden I 
start banging things and doing all kinds of stuff... For some people, I think you have 
to be very rational. And you perhaps have to have an idea of where you think it could 
go, and be the quarterback.” Turetzky acknowledged that establishing a proper group 
rapport can be difficult “if someone has a big ego and wants to make everything 
compositional.” When he perceives that the group flow is in jeopardy, at times he 
may adopt a third strategy: “If there are three of four people, maybe PI stop a little 
bit and let them see what they want to do. If there is a mess, let them sort it out. Let 
them start something and maybe I can support them.” 

Certain exercises employed by improvising actors may be useful for improvising 
musicians. For instance, dramatist Keith Johnstone [30] believes that, “Humans are 
too skilled in suppressing action. All the improvisation teacher has to do is to reverse 
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this skill and he creates very gifted improvisers. Bad improvisers block action, often 
with a high degree of skill. Good improvisers develop action” (p. 95). Improvising 
actors are taught that, instead of denying or rejecting what has been previously 
introduced into the dramatic frame, they should accept the actions/words of others as 
dramatic “offers” and, in turn, add something to the dramatic frame, i.e., present a 
complimentary “offer,” or “revoice” an existing “offer.” The inherent challenge is to 
avoid circumscribing or over-directing the group flow. This does not, however, 
preclude the possibility of swiftly changing dramatic or musical directions, as the case 
may be, but care should be taken to do this in a way that keeps previous developments 
available for future moments of reference or expansion; a practice called “shelving” 
by improvising actors. Of course, evaluating exactly when “revoicing” or “shelving” 
the “offers” of others has been successful can be a tricky proposition. And the 
inherent complexity, polyphony, and polysemy of music can make this even more 
challenging. At heart, however, these exercises in improvised theater, and similar 
ones adopted by musicians, are designed to improve one’s ability to listen and 
remember, so that the ongoing group development will be stimulated rather than 
curtailed. 

Compositional schemes and strategies are often employed to help organize 
improvised music, either prior to, or in the moment of, performance. Deciding how 
or how much to organize performances, here again, becomes a tricky endeavor. John 
Zorn’s Cobra may be the best-known “game piece” for improvising musicians. 
Making a distinction between his work and conventional notions of composition, 
Zorn remarked: 


In my case, when you talk about my work, my scores exist for improvisers. There 
are no sounds written out. It doesn’t exist on a time line where you move from one 
point to the next. My pieces are written as a series of roles, structures, 
relationships among players, different roles that the players can take to get 
different events in the music to happen. And my concern as a composer is only 
dealing in the abstract with these roles like the roles of a sports game like football 
or basketball. You have the roles, then you pick the players to play the game and 
they do it. And the game is different according to who is playing, how well they 
are able to play...” 


With their attentions already engaged in complex ways during performance, others 
worry that highly involved schemes for structuring improvisation can hinder rather 
than assist the natural development of the music. For instance, performer/scholar 
Tom Nunn [32] writes: “When improvisation plans are complicated—-no matter how 
clear or well explained they might be-the attention of the improviser is constantly 
divided between the plan and the musical moment, having to remember, or look at a 
score, a graphic, or even a conductor. What often happens is that both the plan and 
the music suffer from this divided attention” (p. 162). 

In a recent interview, contrabassist Mark Dresser discussed with me the challenges 
inherent in structuring pieces for improvisers: “Composition is often about control. 
You have to build [improvisation] in. I’ve built pieces that have been little prisons, 


°3 For a related treatment regarding jazz improvisation, see [31]. 
*4 Quoted in [10], p. 233. 
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too. You’re looking at something really specific.” But he added, “It’s a trip to find 
the balance. You try to find combinations where you have real focus and 
condensation, and points of real expansion. For me, it is all about being a complete 
musician. All of those things are interesting. At different points in the evening I try 
to have all of those things. Its funny, though, when you get in the composer’s head 
it’s really hard to let go of trying to control it or to create this kind of balance.” 

Even compositional strategies that have the sole intent of facilitating group 
improvisation during performance can backfire. Referring to Butch Morris’s 
extensive system of conducted gestures designed to help organize improvised 
performances, Dresser commented: “I’ve seen the conduction thing be a disaster with 
people who just don’t like to be controlled.” Without pre-conceived strategies, 
however, there is an ever-present danger that improvised music will fail on its own. 
This danger may also increase with the size of the group. Philip Alperson [33] writes: 
“As the number of designing intelligences increases, the greater is the difficulty in 
coordinating all the parts; the twin dangers of cacophony and opacity lurk around the 
corner” (p. 22). 

This makes those moments when group improvisation is deemed successful all the 
more powerful. While interviewing bassist Lisle Ellis, he confided: “A lot of 
improvised music I don’t think is very good music. But man, when it hits, it’s 
extraordinary! That’s what I’ve spent my life doing—waiting for those moments when 
it really lines up-to find a way to have some consistency in it. Some days I think I 
really know how to do that and other days I think I don’t have a clue.” In a telling 
aside that highlights this balancing act of harnessing creativity, Ellis remarked, “I’ve 
got to write more stuff down. I’ve got to write less stuff down.” 

When discussing improvisation and composition, it can be particularly challenging 
to avoid thinking in terms of simple dichotomies while at the same time remaining 
leery of equally facile truisms about the music. Only with dualistic thinking, which 
presents two things as opposed and forces one to choose between them, are preparing 
for something in advance and the leap of freedom into the unforeseen viewed as 
antithetical or incompatible. Dresser finds that, “Within control there are lots of 
possibilities for freedom.” And discussing his time spent as young man in classes 
with Muhal Richard Abrams at the AACM school, George Lewis [34] writes: 
“Improvisation and composition were discussed as two necessary and interacting 
parts of the total music-making experience, rather than essentialized as utterly 
different, diametrically opposed creative processes, or hierarchized with one 
discipline framed as being more important than the other” (p. 86). Dresser recounted 
a telling moment during his first tour with Anthony Braxton’s quartet that resonates 
with this issue: “The only time that Braxton criticized the quartet, he said, ‘Well, you 
guys are playing the music correctly, but you’re just playing it correctly.” The 
criticism was you are being too dutiful, you’re not taking a chance. That was the day 
that the format of the music actually changed, from being a solo-based music to an 
ensemble music. All of a sudden, the nature of the music became different. That 
moment articulated when the group came into its own.” 
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5 Final Thoughts 


Why do people tend to assume that systems are organized either by lead or by seed? 
In part, this is undoubtedly due to the fact that many if not most of our social 
institutions and artistic creations are organized in this way. Yet an extreme reliance 
on centralized organization and centralized metaphors in the past has led to a situation 
in which many people are unwilling or unable to imagine systems organizing in a 
decentralized fashion.” When people hear music they tend to assume a composer, a 
leader, or, when that music is improvised, they tend to assume that creativity emerges 
solely from the individual. In many cases these intuitions may be right. But one of 
the more encouraging aspects of much contemporary experimental music is that it is 
not always easy or even possible to know if a particular instance of music was or was 
not composed ahead of time.’ And the generative power of computers is blurring 
these lines even further. Perhaps most encouraging of all, however, is the fact that 
creativity is increasingly being viewed as a web of network interactions operating on 
all scales, reflecting individual, social, cultural, and historical dimensions. 

There are many compelling reasons to view artistic behavior not as some special 
kind of activity cut off from the rest of human behavior but rather as much an 
adaptation to the environment as any other human activity. Since a primary drive of 
human beings is to perceive the environment as comprehensible and to make 
successful predictions about the future, we have developed a cognitive/sensory 
orientation that filters out any data that is not relevant to the needs of the moment. 
But since such an orientation does not prepare an individual to deal with a particular 
situation but only with a category, or kind, or class of situations, much of the 
suppressed data may very well be relevant. The arts in general, and music in 
particular, may serve the function of breaking up entrenched orientations, weakening 
and frustrating our “tyrannous drive to order,” so that humans are better able to deal 
with change, complexity, and chaos.”’ 

Improvisers engage the unforeseen; they offer the experience of disorientation.”* 
They look to find problems, rather than to solve them. Improvised music also 
reminds us that the notion of “art” is most appropriately located not in the “work” 
itself, but rather in the perceiver’s role; a role that involves maintaining a search- 
behavior focused on discontinuities. Emotional affect is not intrinsic to the “work”, 





°5 Decentralization may be biological coded for ants and other social insects, but it does not 
seem to be as natural or automatic for humans. Or it may simply be that, because we are within 
the system, we remain unaware of its emergent properties, just as individual bees and ants may 
be unaware of their group’s emergent social organization (although this hypothesis is difficult if 
not impossible to test). For lucid writing on this subject see [35] and [36]. 

26 Although this blurring may be artistically encouraging, we still need to be aware of cultural 
assumptions that accompany our notions of musicking. Eddie Prévost [37] recounts an AMM 
performance after which a woman came up to the musicians and remarked how moved she had 
been by the music. Once she learned that the group had been improvising rather than playing 
from a memorized score, she not only doubted their artistic and intellectual integrity, but she 
was forced to question her own powers of discrimination. “How had it been possible for her to 
enjoy and admire such work when its practice had been so... primitive.” 

°7 For some prescient writing on this subject see [38]. 

?8 The Latin roots of the word improvisation are in-not and provisus-foreseen. 
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but rather is dependent on a successful performance of the perceiver’s role; emotion is 
the result of a discrepancy between expectation and actuality.” Perhaps most 
importantly, improvising music together allows participants and listeners to 
experience and explore complex, decentralized, interconnected, and emergent social 
dynamics. 

Recent work in the cognitive neuroscience of music concerned with the role that 
music plays in human evolution and development supports this view rather well. Ian 
Cross [41], a leading researcher in this still nascent field, argues that music’s 
nonefficaciousness—its general remove from immediate concerns for survival (from a 
strict biological perspective)—make it especially well suited to testing out aspects of 
social interaction, while its polysemy-—its ability to producing multiple meanings— 
endows us with the multipurpose and adaptive cognitive capacities that make us 
human. In less technical language Cross writes: “[M]usic can be both a consequence 
free means of exploring social interaction and a ‘play space’ for rehearsing processes 
that may be necessary to achieve cognitive flexibility” (p. 51).°” People cooperating 
in a musical activity need not find the same meaning in what they do in order for the 
musical event to assist them in acquiring and maintaining the skill of being a member 
of a culture. As Cross sees it, “The singularity of the collective musical activity is not 
threatened by the existence of multiple simultaneous and potentially conflicting 
meanings” (ibid.). Through continual engagement with art—-viewed as the successful 
performance of the perceiver’s role-we may in fact be better prepared to survive and 
flourish in our increasingly interconnected, and therefore interdependent, world. 

It is interesting to note that two of the hottest current topics for organizational 
design are the sciences of complexity and jazz music. Both domains emphasize 
adaptation, perpetual novelty, the value of variety and experimentation, and the 
potential of decentralized and overlapping authority in ways that are increasingly 
being viewed as beneficial for economic and political discourse. Robert Axelrod and 
Michael Cohen [43] see in the move from the industrial revolution to the information 
revolution a powerful shift from emphasizing discipline in organizations to 
emphasizing their flexible, adaptive, and dispersed nature. And Karl Weick [44], in a 
special issue of the journal Organization Science devoted to an exploration of “the 
jazz metaphor,” finds that the music’s emphasis on pitting acquired skills and pre- 
composed materials against unanticipated ideas or unprogrammed opportunities, 
options, or hazards can offset conventional organizational tendencies towards control, 
formalization, and routine. In a response to the heavy reliance by journal contributors 
on swing and bebop as the source of their jazz metaphors, Michael Zack [45] outlined 
ways in which free jazz might propel discourse even further into the realm of 
emergent, spontaneous, and mutually constructed organizational structures. 

Are there lessons from improvising music that can help us to understand, or at least 
to cope with, the complexity of our world? Improvising music makes us aware of the 
power of bottom-up design, of self-organization. It operates in a network fashion, 





?? See Joseph Goguen’s work in [39] and in the co-author chapter of [40]. 

30 The notion of music as a “consequence free” activity is somewhat problematic, but it is used 
here in the biological sense that music, in most all cases, does not by itself do physical harm to 
humans. Since social interactions play an important role in our cognitive development it should 
also be clear that these two properties cannot be easily divorced from one another. The notion 
of “play” in relation to improvised music is taken up in [42]. 
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engaging all of the participants while distributing responsibility and empowerment 
among them. Networks facilitate reciprocal interactions between members, fostering 
trust and cooperation, but they also can concentrate power in the hands of a few. 
Under the best of circumstances, improvising music encourages social activities that 
support the growth and spread of valued criteria through the network. For instance, 
improvisers tend to value diversity, equality, and spontaneity and often view their 
musical interactions as a model for appropriate social interactions. Tom Nunn [32] 
writes: “Free improvisers are important to the society in bringing to light some 
fundamental values and ideas, for example: how to get along; how to be flexible; 
how to be creative; how to be supportive; how to be angry; how to make do. So there 
is a social and political ‘content’ in their music that seems appropriate today, though it 
may not usually be overt” (p. 133). 

As we continue to explore ways of improvising music, we should look for ways to 
assist would-be cooperators in interacting more easily and more frequently. The 
robustness and equity of a network system is a direct result of the range and number 
of interactions. We should also look to maximize participation from the fringes, 
rather than the core. In complex systems, a healthy fringe speeds adaptation, 
increases resilience, and is almost always the source of innovations. For instance, 
nearly every new style of American popular music has emerged from the periphery— 
from a localized, and often disadvantaged, community—to capture the attention of 
national and international audiences (at which time much of the music’s original 
meaning may of course be sacrificed). 

Fostering improvising music has the potential to overcome the inherent problems 
of a slow-moving traditional hierarchy, providing an effective way to handle 
unstructured problems, to share knowledge outside of traditional structures, and to 
inject local knowledge into the system. Improvising music also ensures that the 
cognitive models and metaphors we live by remain flexible, while it reminds us that 
our flexibility to learn and adapt are grounded in the bodily and the social. Without 
cultivating this embodied, situated, and distributed approach to music making, and 
without maintaining a healthy reverence for uncertainty, we can build complicated 
music systems, but not complex ones. 

Complex systems must strike an uneasy and ever-changing balance between the 
exploration of new ideas or territories and the exploitation of strategies, devices, and 
practices that have already been integrated into the system. In other words, complex 
systems seek persistent disequilibrium; they avoid constancy but also restless change. 
Perhaps in a way similar to democracy, which along with jazz music has been a 
powerful symbol of liberation and resistance to oppression, improvising music 
teaches us to value not only cooperation, but also compromise and change. In 
politics, as in music, a notion of the “common good” is bound to mean different 
things to different individuals and groups, such that the democratic experience is one 
of not getting everything you want. In a similar way, the value of improvising music 
lies not in the outcome of a single performance, but rather it emerges over time 
through continued musical and social interactions. Improvising music together does 
not necessarily produce optimal outcomes, but the decision to improvise music 
together does. 
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Abstract. A personal account of how Joseph Goguen and I came to 
work together and of the influence that Tibetan Buddhism had on us 
and on our collaboration. A brief discussion of some neurological exper- 
iments using meditators and how Goguen’s work connects Buddhism, 
computing, and cognition. 


Mind, Heart and Meditation 


Joseph Goguen has an extraordinary mind and a big heart. My friendship with 
him is long and deep, and it has affected my life in two major ways since I 
met him in 1974. First we worked closely together on modularity and program 
specification for a dozen years or more and continued to have many conversations 
about computing until my retirement in 2000. During this time I learnt a lot 
from Joseph about category theory and its applications to computing. Second 
he introduced me to Buddhist practice and thought under the guidance of a 
Tibetan teacher, Chogyam Trungpa Rinpoche, with whom we both studied until 
his death in 1987. 

I will say a little about how I met him and what our motivations were, but 
mainly I would like to describe a part of his life, and mine, which will be less 
familiar to most readers, namely his interest in Buddhism. Later this year it 
will be the thirtieth anniversary of Joseph giving me instruction in the medita- 
tion technique which he learned from Chogyam Trungpa. I am still practicing 
it regularly, and now I spend some months each year in solitary or group medi- 
tation retreats. I have been thinking particularly about the connection between 
Buddhist empirical knowledge and science. 


Meeting and Working with Joseph 


In the early seventies I was working at Edinburgh University on programming 
languages and correctness proofs, and I had learned some elements of universal 
algebra and category theory. When I was in the US I arranged to visit Jim 
Thatcher, who had written a paper with Wright on categories and automata. It 
turned out that Jim was more interested in stopping the Vietnam War than in 
category theory, but both ideas were fine by me. I gave him a hundred dollars 
to stop the war, and I got him a visiting fellowship to Edinburgh. One day 
Jim told me that a colleague of his, a very clever mathematician named Joe 
Goguen, would visit us in Edinburgh. I was quite terrified of meeting a clever 
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mathematician, but found to my surprise that this Joe person did not frighten 
me at all. So next year in 1975 when I was at a conference in Los Angeles I fixed 
up to stay over the weekend with him. It was a very exciting weekend, and at 
my invitation Joseph spent that summer in Edinburgh on a Science Research 
Council Visiting Fellowship. This was the start of our technical collaboration. 

We were both interested in software and program correctness proofs, also in 
how to apply proof techniques to larger programs by imposing a modular struc- 
ture on them. Joseph suggested we think about modularity of specifications, as 
that might be easier then thinking about programs. This led to our development 
of a specification language called Clear. To give it semantics we used category 
concepts to explain parameterised specifications and ways to combine them. It 
seemed that the parameter mechanisms and the ways of combining them should 
not depend on the particular specification language, so we came up with the 
categorical concept of an institution to abstract away from the underlying lan- 
guage, which might be equational logic or predicate calculus or whatever. This 
was done while we were visiting each other in Edinburgh and Los Angeles, two 
very contrasting environments. I learnt more about categories. In LA, I learned 
about going out for breakfast. We drew diagrams on paper napkins in the Pan- 
cake House, then in the sand on Santa Monica beach. Once when we thought 
we had had a good idea we danced down the street, under the suspicious eyes of 
a passing LAPD police car. All this was very exciting and, I believe productive. 
We continued to work happily together on these topics from time to time until 
the nineties. I also had much pleasure collaborating for some years with Joseph’s 
talented son Healfdene, who joined my research group in Edinburgh. 


Creativity and Uncertainty 


Turning to the Buddhist side of Joseph’s life, let me backtrack in time and 
explain why I had started practicing meditation before I met Joseph. 

Studying physics at Cambridge University in the fifties I became fascinated 
by the idea of computing. After various twists and turns, teaching myself with 
help from friends, I wound up as a Research Fellow in Edinburgh University 
in the “Experimental Programming Unit”. I was thinking about programming 
languages, artificial intelligence and lambda calculus, cheerfully staying up late 
at night to write and debug code — by 1966 our unit actually had a computer 
to ourselves, the only one in the University. 

Now I found myself writing papers and taking part in workshops, part of a 
critical community. After a while I realized that the joy of creativity had become 
tinged with competition and with doubt whether people would think that I was 
doing well enough. So it seemed that external activity was not enough, I also 
had to deal with my own mind. Like many people I had long been interested in 
mind, indeed part of the attraction of computers was the hope that they could 
give us insight into the workings of our minds. So meditation was interesting 
for both personal and intellectual reasons. It promised an investigation of mind 
from the inside. 
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Joseph and Tibetan Buddhism 


In the early seventies Joseph met and studied with an unusual man, Chogyam 
Trungpa Rinpoche, an accomplished and respected scholar, meditation master 
and teacher who fled Tibet in 1959 at the age of nineteen and wound up in the 
United States, via India and Britain. Trungpa was a poet, artist and practical 
joker who had a profound impact on many of those who met him. 

What Trungpa taught was, in Western terms, somewhere between psycho- 
therapy, philosophy, life coaching, religion and how to become a kind and open- 
minded person. It was the result of two and a half millennia of empirical investi- 
gation of the human mind from the inside, and it included a practical technology 
for training the mind. But it had been nurtured in Tibet in isolation from the 
rest of the world, and it was expressed in a language which was little known 
and hard to translate, with its own rich technical vocabulary. Trungpa opened 
up to Western culture, learned English and translated not just the words but 
also the concepts. He also developed a number of non-verbal ways of getting his 
message across, for example by teaching former hippies who had been attracted 
to his teachings in the seventies to decorously dance the Viennese waltz. He 
first indulged his Western students, teased them and then demanded extraordi- 
nary effort and discipline. Extraordinary effort and discipline was nothing new 
for Joseph: in 1975 he spent twelve weeks at a Buddhist ‘Seminary’ taught by 
Trungpa, an intensive regime of meditation and study. 

It was a few weeks after this that I stayed with Joseph in Los Angeles for the 
first time as recounted above. He took me along to the local Buddhist centre and 
played me a taped talk by Trungpa. Someone asked a question about Mozart 
and to my surprise Trungpa seemed to admire his music — I was curious about 
this Tibetan guru with an appreciation of eighteenth century European music. 
The following year when Joseph was spending a second summer in Edinburgh I 
asked him to teach me the basic Buddhist meditation technique. 

In 1981 Joseph attended a second twelve-week seminary, and this time my 
wife and I were there too (our two eldest daughters followed in later years). Over 
three hundred students and staff took over an off-season hotel in the Canadian 
Rockies. Periods of meditation, 7 a.m. to 9 p.m. with brief breaks after meals, 
alternated with periods of teaching, study and more meditation, running later 
still if Trungpa was teaching. Our family continued to practice what we had 
learnt, and we were very grateful to Joseph and to Trungpa Rinpoche. 

We were taught practices to pacify the mind, open up our awareness and 
develop kindness and compassion to others. The main point was to pay moment 
by moment attention to what was actually happening in our minds, wandering 
thoughts and emotions: irritation, curiosity, regret, benevolence or whatever — 
touch it and let go. This was a sort of animal behaviour investigation with the 
animal being our own mind. Beyond this there were techniques aiming to change 
one’s mental processes by prescribed exercises of the imagination. In particular 
we worked to diminish the “destructive emotions” of anger, passion/addiction, 
ignorance, envy and arrogance. (A selection of Trungpa’s talks is available in [8].) 
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Since then Joseph and I have both pursued the Buddhist path under the 
direction Trungpa and, after his death, under his son Mipham Rinpoche. We 
have many times shared our ideas and experiences. So now let me sketch some 
ways in which this could connect with the scientific side of our lives in neuro- 
science, psychology of emotions or cognitive linguistics, also with Joseph’s own 
contribution to studies of consciousness. 


Connections Between Buddhist and Western Explorations of the 
Mind 


The two and a half thousand year old culture which we call Buddhism developed 
psychological models for the mind and the processes of perception and action, 
based on internal meditative investigations and the results of many different 
methods of training the mind. These methods are essentially technical, complete 
with manuals, and not based on any kind of supernatural interventions. They 
are, of course, not exclusive to Buddhism, witness other Indian traditions, Sufis 
and Christian contemplatives. 

Stephen Laberge, who has conducted experiments on lucid dreaming and 
compared his techniques with those of the Tibetan tradition [4], comments 


The effectiveness of a psychological technique can be tested by careful 
observers of the contents of consciousness without the need of technol- 
ogy other than a well-trained mind and a disciplined body. On contrast, 
testing the validity of an explanation of that technique may require the 
extremely sophisticated technology needed for the visualization and mea- 
surement of neural activity. 


Richard Davidson at the University of Wisconsin at Madison was able to 
examine with an fMRI scanner and with EEG the neural activities of advanced 
meditators using the Tibetan methods. In a first experiment “Lama Oser” (a 
pseudonym), a Westerner, who has been a Tibetan monk for about thirty years, 
was tested using six different meditation practices, one minute each with a pause 
of one minute between them. Oser’s brain showed clear distinctions between 
these different meditations and the pauses. His sharp shifts between different 
activities were exceptional. In the EEG tests, when meditating on compassion, 
his brain showed “a dramatic increase in the electrical activity known as gamma” 
in an area of the brain associated with “happiness, enthusiasm, high energy and 
alertness” [3] pp. 1-13]. In a later experiment, Davidson was able to confirm 
the EEG results, comparing a group of experienced meditators with a group of 
novice meditators [iJ7]. 

Paul Ekman of the University of California at San Francisco, an expert on 
the science of emotion, tested the ability of Lama Oser and another very ex- 
perienced Western meditator (each had done a total of two or three years of 
solitary retreats). In one test he asked them to identify “microemotions” , facial 
emotions such as fear or contempt, which only appear for a fraction of a second 
and are impossible to control deliberately, showing them videotapes of flashes 
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of one fifth or even one thirtieth of a second of the faces. They both showed 
ability two standard deviations above the norm, far higher than any of the five 
thousand other people tested, including policemen, psychiatrists and even Secret 
Service agents. Such a diagnostic ability for emotions would be helpful to guide 
students in the transformative practices of the Buddhist tradition [3] pp. 13-21, 
123-131]. 

Turning to psychological models rather than meditation techniques, the Ma- 
hayana tradition of Buddhism emphasizes the concept of “emptiness” (Sanskrit 
“shunyata”). It is puzzling how this relates to Western traditions. The Mad- 
hyamaka approach tries to show the inadequacy of our conceptual system by 
reductio ad absurdum arguments. Some of these seem to deal with paradoxes 
reminiscent of Zeno’s paradoxes, for example ones about movement, which have 
been clarified by Western work on calculus and limits. But it seems to me that 
these arguments should be directed, not at our mathematics or physics theories, 
but rather at the built-in conceptual reasoning systems which are common to 
all humans. These systems are of interest to cognitive science and cognitive lin- 
guistics, and Joseph has long drawn my attention to the work of George Lakoff 
and his associates on “metaphor”. The idea here is that our conceptual models 
of the world start from in built sensory motor conceptual schemes, such as the 
idea of containment, for example “a triangle inside a square” or “the path of a 
movement with starting and finishing points”. From these other more abstract 
concepts, “being in trouble” or “on the road to ruin”, are derived by metaphors 
(mappings or morphisms). A derived concept can have its meaning determined 
by several such metaphorical maps. Lakoff and co-authors have applied these 
ideas to human understanding both in philosophy [5] and in mathematics [6]. 
All this is reminiscent of the early work by Joseph and myself on defining a 
specification language, Clear, in terms of theories and theory morphisms. In 
the last few years Joseph has been working on the construction of conceptual 
systems using “semiotic morphisms” (For other references see his website 
http: //www.cs.ucsd.edu/users/goguen/projs/semio.html). 

Another connection is Joseph’s work as founder and editor of the Journal of 
Consciousness Studies, which has fostered growing interest in this aspect of mind 
and published work from many disciplines by philosophers, psychologists, neuro- 
scientists, linguists and practitioners of the ancient traditions of contemplation 
and meditation. 

The Buddhist tradition is just one of many wisdom traditions, spiritual, 
psychotherapeutic and medical. We need to keep these alive. Using the analytical 
methods and tools of science, some elements of these traditions will be better 
understood, so that they can take their place as part of the global culture of 
accepted knowledge. 


Conclusion 


I hope that this personal view will have illuminated one less public side of 
Joseph’s life journey and given some feel for how it coheres with his exten- 
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sive and admirable work in computing. I count myself very fortunate to have 
shared some part of that journey with him. 
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Abstract. The works of Joseph A. Goguen and Samuel R. Delany address wide 
arrays of "big" issues in philosophy: identity and qualitative experience, 
semiotic representation, and the divergence between meaning in formal systems 
of understanding and in everyday lived experience. This essay attempts to draw 
out some of the parallels between the works of these two authors, in particular 
regarding metalogic, qualia, and identity, using illustrative examples from the 
works of both authors. Their works exhibit parallel dual strands: (1) a desire to 
rigorously and precisely map out these fundamental issues, and (2) a desire to 
acknowledge and embrace the ambiguities of phenomenological experience and 
its divergence from any formalizable theory. In the end, addressing such a wide 
range of issues has required both authors to develop and adopt new discourse 
strategies ranging from rational argumentation to mathematics, from religious 
and philosophical commentary to speculative (science) fiction and poetry. 


1 Introduction 


A perusal of any dozen pages from the Summa reveals Slade's 
formal philosophical presentation falls into three, widely differing 
modes. There are the closely reasoned and crystallinely lucid 
arguments. There are the mathematical sections in which symbols 
predominate over words; and what words there are, are fairly 
restricted to: “... therefore we can see that...,” “...we can take this 
to stand for...,” “...from following these injunctions it is evident 
that...,” and the like. The third mode comprises those sections of 
richly condensed (if not impenetrable) metaphor, in language more 
reminiscent of the religious mystic than the philosopher of logic. 
For even the more informed student, it is debatable which of these 
last modes, mathematical or metaphorical, is the more daunting. [8] 
— Samuel R. Delany, discussing the work of the fictitious 
metalogician Ashima Slade 
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I'm afraid that the reader may have found this paper rather a long 
strange trip, starting from the practice of software engineering, 
then going to category theory, and eventually ethics, passing 
through topics like equational deduction, various programming and 
specification paradigms, semiotics, theorem proving, requirements 
engineering and philosophy. 


From another perspective, this paper can be considered a diary 
from a very personal journey moving from a mathematical view of 
computing, through a process of questioning why it wasn't working 
as hoped, to a wider view that tries to integrate the technical and 
social dimensions of computing. This journey has required a 
struggle to acquire and apply a range of skills that I could never 
have imagined would be relevant to computer science. Always I 
have sought to discover things of beauty — “flowers” - and present 
them in a way that could benefit all beings, though of course I don't 
expect that very many people will share my aesthetics or my ethics. 
[15] 

— Joseph Goguen, excerpts (slightly reordered) from an 
autobiographical essay tracing the trajectory of his research career 


The aroma of algebraic flowers motivates this paper. Joseph Goguen has used the 
metaphor of flowers to describe the strivings of his own work because of the 
parsimonious beauty it is possible to evoke with elegant formalizations in 
mathematics. For him the essence of these “flowers” is rooted in compassion and a 
true desire to benefit humanity. Yet, Goguen’s metaphor for his work is also one of 
loss. His autobiographical essay “Tossing Algebraic Flowers Down the Great 
Divide,” [14] suggests that his beautiful work is tumbling downward into a dark 
crevasse between technical and social scientific or humanistic disciplines, perhaps 
only to be discovered at an unknown time, or perhaps never. 


It is not so! Goguen's algebraic flowers garland a gossamer network of bridges 
between diverse fields: computing, mathematics, philosophy, sociology, semiotics, 
narratology, and more. Though perhaps more researchers are familiar with Goguen's 
work on the technical side of the divide, I intend to highlight the bridge his work 
builds from computing and mathematics to humanistic and artistic issues. Personally, 
this bridge has been a profound influence on my work. My academic training is in 
logic, interactive media art, and computer science. In the course of these studies, I 
became interested in new forms of interactive narrative that take advantage of the 
affordances provided by computing. I came to feel that a powerful direction in 
interactive artwork is to allow user interaction to affect meaning with narratives, and 
with Professor Goguen's guidance as my advisor this intuitive direction transformed 
into specific goals, for example generating new metaphors or constructing narratives 
as users provide input. Toward this end Goguen's algebraic semiotics and his 
approach to user-interface design were a revelation. He is an expert mathematician 
dealing with semiotic issues also addressed by art theory. He is a computer scientist 
who espouses the importance of narrative. Underneath this all is a concern for the 
social, ethical applications of his work. Because he has not compromised his work 
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toward either side of the divide, Goguen's feeling of loss regarding this work is 
probably due to the limited number of people on either side of the divide interested in 
seriously addressing the issues and methods of greatest import on the other side. I 
have described my own background only because I live directly in the center of the 
divide. For people like me, Goguen's work in these areas is of great importance both 
for its application and example. It can be used directly for artistic technical practices 
and it is an example of what is possible to achieve when combining methods from 
diverse fields with rigor and a careful attention to the values implicit in them. This 
essay is intended to convey this important aspect of Goguen's work by focusing on 
several particular topics in his oeuvre and contrasting them with the work of another 
author that has inspired me, Samuel R. Delany. 


The title of this paper refers to my attempt to find sympathy in the works of these two 
eclectic and profound authors. The planet Neptune’s largest moon is Triton, here 
alluding to the title of Delany’s science fiction novel Trouble on Triton: An 
Ambiguous Heterotopia. The idea for the thesis of this paper was inspired by the 
character mentioned in the Delany quote above from that same novel. In the character 
Ashima Slade, using the idiosyncratic genre of “critical fiction” which allows 
meticulous commentary on his fictitious author, his lectures, and his theory, Delany 
has constructed an astounding parallel counterpart for Goguen. The parallel is 
astounding because of the amazing correspondence of topical concerns that exist 
between Delany’s essay, and the content and style of his character Ashima Slade’s 
Harbin-y Lecture Shadows' (on the topic of the “Modular Calculus,” which grew out 
of “metalogic”) [8] [10]. 


Goguen has never been one to shy away from “big” issues of human existence. 
Likewise, as a science fiction and fantasy author constructing civilizations, ancient 
and futuristic, in part to illuminate sociological points, Delany addresses major 
philosophical themes. Both are employed as university professors, Goguen in 
computer science and Delany in English and creative writing, yet the works of each 
extend well beyond their disciplinary boundaries. Indeed in the quote above Goguen 
expresses that his work has taken him on a journey through exotic disciplinary locales 
ranging from category theory to ethnomethodology, and his work also ranges to 
Buddhist thought and poetry and fiction writing on occasion. Similarly, Delany has 
commented on a wide range of concerns including semiotics, paraliterature, cultural 
theory, discourse analysis, gender studies, as well as producing meditations on 
mathematics and technology. These lists of interests of the two authors are not 
exhaustive, but they serve to highlight the difficulties, and pleasures for those 
sympathetic to deep interdisciplinary thought, in elucidating parallels in two prolific, 
singular authors. 


There are many specific parallels in the works of Goguen and Delany. Mathematical 
metaphors are pervasive in Delany’s oeuvre and metalogic takes a prominent role in 


' Robert Elliot Fox tells us in his book Conscientious Sorcerers that “the title in the first lecture 
of the series, “Shadows,” is one the Delany himself used for a speculative/critical essay. As 
Slade’s fictitious editor tells us, Slade took the title ‘from a nonfiction piece written in the 
twentieth century by an author of light, popular fictions.” [10] 
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Trouble on Triton in particular. By the same token, identity and difference are major 
themes in Goguen’s work. Often he addresses such concerns through very abstract 
mathematics such as the theory of institutions which allows for the comparison of 
logics (a type of metalogic). Though he is not as explicit about politicized social 
identity in the same sense as Delany, Goguen is also concerned with the relationship 
of these themes to everyday lived experience. This can be seen in his work on qualia. 
In phenomenology, philosophers use the term “qualia” to describe introspectively 
accessible feelings of everyday life that are irreducible to objective characteristics. 
[25] Goguen has carried out a set of experiments relating qualia to the issue of 
identity and difference. Similarly, while many artists are interested in exploring the 
qualitative experiences of life, Delany creates rigorous literary thought experiments 
that also seem to address the qualia of identity, in his case usually experiences of race, 
gender, sexual orientation, and similar issues of social identity. The care with which 
Delany constructs these detailed explorations is exemplified below in Section 2.1 as 
he uses the metaphor of metalogic to make very specific observations about the nature 
of race. Finally, both authors are brazenly concerned with mapping out meaning in 
all of its modularity and nuance. They are unified in this concern as they both draw 
upon a broad range of traditions from science, mathematics, literature, and social and 
cultural theories to comment upon some of the most fundamental issues we, as 
humans, experience in life. 


The task of investigating the parallels above is quite worthwhile. It serves to 
highlight contributions of both Goguen and Delany that perhaps are less well-known 
than their main contributions to their fields, and more importantly because of the 
insights such an exercise provides to issues such as (1) identity and qualitative 
experience, (2) semiotic representation, and the (3) divergence between meaning in 
formal systems of understanding and everyday lived experience. These three issues 
are intended to focus this paper (as opposed to representing a comprehensive outline 
of shared concerns between the authors). This is not meant to be a complete survey of 
either author’s work since I intend rather to highlight particularly salient parallels 
between them. Thus, the paper is structured as a series of two case studies followed 
by discussion and a conclusion. 


The first case study is centered on Delany’s description of “metalogic,” and the 
“modular calculus” where appropriate, in his novel Trouble on Triton: An Ambiguous 
Heterotopia. The second case study is centered on the philosophical notion of qualia 
in Goguen’s work in several papers [16] [19], and the theory of institutions where 
appropriate. [18] These case studies are unified by a concern with identity, though 
the starting points from which Goguen and Delany consider identity are quite 
different. The case studies are followed by a discussion that highlights the tension 
between both authors’ desires to rigorously map meaning and representation (semiotic 
concerns), and both authors’ realizations that this is a Sisyphean task when confronted 
with the immensity of the real world and human perception of it. The paper 
concludes with an account of the various discourse styles and strategies Goguen and 
Delany use to express their ideas — an account of the artistry of the authors. Their 
discourse styles can be seen as roughly fitting into the same three categories that 
Delany outline’s for Ashima Slade’s work: (1) well-reasoned rational argumentation, 
(2) mathematics (in Delany’s case sometimes pseudomathematics used in a 
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metaphorical way), and (3) more esoteric, artistic, or even religious/spiritual 
discourse. 


2 Metalogic, Qualia, and Identity 


2.1 Delany on Metalogic and Identity 


Trouble on Triton: An Ambiguous Heterotopia is a novel that tells the story of a self- 
described “reasonably happy man,” living in a futuristic society on Neptune’s moon 
Triton. [3] In truth, this man, a conflicted and pompous anti-hero named Bron 
Helstrom, is far from satisfied. He is ill at ease with his own social identity and 
relationships with others. He is not a likable or sympathetic character, perhaps meant 
to represent the pretentiousness often brought on by experience of the privileges 
accompanying dominant social status. In a world where physique, gender, religion, 
and race are nearly instantly reconfigurable, a world at war with our own planet Earth, 
Bron is constantly concerned with how he presents himself externally, and with 
compensating for his own insecurities. Though largely a meditation on identity, the 
novel features a robust metaphor of mathematics to address the qualitative experience 
of identity and the potential for transformation of identity. 


At one point early in the novel Bron Helstrom takes about seven pages, and many 
elaborate analogies involving colored clouds as spaces of significance, hens and a half 
laying eggs and a half, and the grotte between the tiles of the Taj Mahal, to provide a 
brief description of the field of metalogic. [6] Though in the novel’s storyworld 
metalogic is meant to provide a rigorous theory and methodology for problem solving 
in the real world when rules of formal logic are inadequate, it becomes immediately 
clear that Delany’s discussion of metalogic has the issue of identity, and especially 
racial identity, as a subtext. 


The reader is oriented to this subtext as the character Miriamne (to whom Bron is 
about to pontificate on metalogic) responds to Bron’s question on her preference for 
how she takes her coffee: 

“Black,” she said from the sling chair, “as my old lady,” and 

laughed again... 

“That’s what my father always used to say.” She put her hands on 

her knees. “My mother was from Earth — Kenya, actually; and I’ve 

been trying to live it down ever since.” [5] 
Bron’s parents are soon to be revealed as “large, blond, diligent” and “like so many 
others it was embarrassing, laborers.” The discussion is then, at the level of 
nonfictional communication between Delany and the reader [22], a commentary on 
the social situation of a white male, possessed of a strong sense of entitlement and 
oriented primarily toward class distinctions, lecturing a woman of color. This 
commentary plays out metaphorically and metonymically as metalogic is explained 
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via several examples that are rich with terms that parallel racialized color such as 
“black,” “white,” “brown,” “pink,” “red,” “tan,” “colored,” and “nonwhite.” 


Specifically, Bron begins by posing a challenge to the “beginning tenet of practically 
every formal logic text ever written, ‘To deny P is true is to affirm P is false’.” The 
color consciousness comes into play when Miramne responds by mentioning that she 
recalls “something about denying the Taj Mahal is white ... is to affirm that it’s not 
white ... an idea that, just intuitively I’ve never felt comfortable with.” Delany goes 
on to explicate this discomfort by having his character Bron elaborate upon metalogic, 
with a series of arguments using the color of the Taj Mahal as an example. This 
series of arguments clearly could apply as easily to a discussion of the nuances of 
racial identity, moving from a simplistic system of finite (binary initially: white vs. 
nonwhite) classification to a much more complicated system, a “parametal model of 
language,” that stresses the metaphor to the breaking point as exemplified by the 
following quote: 
...he used the fanciful analogy of “meanings” like colored clouds 
filling up the significance space, and words as homing balloons 
which, when strung together in a sentence, were tugged to various 
specific areas in their meaning clouds by the resultant syntax 
vectors but, when released, would drift back more or less to where, 
in their cloudy ranges, they’d started out. [7] 
I now present a summary of the points that Bron makes in his informal discussion of 
metalogic and argument against the idea that to deny P is to affirm not-P: 
(1) Premise: denying the Taj Mahal is white is not to affirm that it is not white 
(2) the significance of ‘white’ is a range of possibilities 
(3) the significance of ‘white’ “fades imperceptibly” through grey to black and 
through pink to red, and even to some non-colors 
(4) accepting that ‘white(Taj Mahal) = F’ — ‘~ white(Taj Mahal) = T’ means 
placing a boundary around an area in the range of significance and to call 
everything in this area white and everything outside of it not-white 
(5) this is already a distortion of what was already mentioned to exist, namely 
fading ranges of color and non-color 
(6) values on the boundary line are unaccounted for 
(7) objects that are piecewise white and not-white are unaccounted for, (e.g. the 
Taj Mahal is made of white tiles held to brown granite by tan grotte) 


Notice that at this point the “Taj Mahal” in this discussion could have been 
substituted by “racial ambiguous individual” with no effect on Bron’s argument 
(besides making it more socially salient or politically charged). Furthermore, we have 
reached a point where a solution to the problem is to describe the Taj Mahal, or 
racialized person, piecewise as being ‘white’ and also being some other discrete color 
signifiers. This is how archaic (really still in practice, only sometimes less overtly) 
systems of racial identity functioned, with any number of arbitrary discrete color 


? This is strikingly reminiscent of Duke Ellington’s “Black, Brown, and Beige” suite. [9] 
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categories often defined by quantified mixtures of identity*. Indeed I personally grew 
up well aware of the “one drop” rule that holds sway in the United States of America: 
any bit of “black blood” implies blackness (up to a practical limit of 1/16). It is 
common for individuals whose parents are identified as belonging to different racial 
groups to claim “biraciality,” or even more finely grained subdivisions of race. DNA 
testing technologies [2], along with contemporary sociological theories of 
classification admitting the arbitrary nature of race [1], have rendered these piecewise 
and discrete classifications of identity obsolete. With all this in mind, I present 
Miriamne’s response to Bron’s argument so far: “Wait a second: Part of the Taj 
Mahal is white, and part of the Taj Mahal is brown, and part of the Taj Mahal is — ” 
to which Bron responds by continuing his argument as follows: 
(8) the words ‘Taj Mahal’ also have a range of significance 
(9) the range of significance of ‘Taj Mahal’ is not discrete, is not unambiguous, 
and cannot be bounded in a simple two-dimensional model 
(10) the Taj Mahal must be described in terms of continuously valued parameters, 
not discrete perimeters. “Language is parametal, not perimetal. Areas of 
significance space intermesh and fade into one another like color-clouds in a 
three-dimensional spectrum.” 
(11) thus ‘logical’ bounding is dangerous because it implies that boundaries can 
be placed around significance spaces 
(12)natural language can overcome these problems and provide parametal 
descriptions 
(13)rigorous and precise modeling of such phenomena using mathematics 
requires extremely advanced tools of analysis (at minimum metalogicians 
have simple model with seven coordinates, in practice they often use twenty- 
one, and even this is just an abstract model for visualization that does not 
fully explain the real world, i.e. “real space”) 


At this point, Bron’s argument is not yet complete. The problem is that “significance 
space” has been reified. That is, it is being treated as if it exists in the real world and 
there is such a thing as a “real” significance space to be modeled. Delany’s 
perspective here, as expressed through the character Bron, foreshadows recent 
directions in cognitive science. Bron’s explanation shifts to expressing “how what- 
there-is manages to accomplish what-it-does,” namely how the brain and sensory 
perception are the origins of complicated concepts such as “significance space” and 
other concepts in general. In short, it is almost an embodied perspective of cognition 
[26] (though Delany does not discuss motor operations). In this view “meaning” 





3 The artist Betye Saar expresses this using real historical colorized terms for black people 
found in popular culture and works such as those of the author Langston Hughes. Some of 
these are: “bright/light, cream, fair, marinee, peola, pinky/pink toes, taffy, vanilla, banana, 
butterscotch, café au lait, ginger, golden, honey, peaches, yella/high yella/deep yella, almond, 
caramel, copper, red/red bone, rusty, bark, brownie, brown sugar, cocoa brown/high brown, 
low brown/seal brown/tobacco brown, chocolate/chocolate drop/deep chocolate, molasses, 
walnut, bronze, blackie, blackbird/blackberry, black/blue black/charcoal black/coal 
black/dark black/deep black/lamp black/stove black, crow jane, licorice, midnight/beyond 
midnight, nightblack boy, tar baby.” 
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depends upon the fact that humans exist “in a world that is inseparable from our 
bodies, our language, and our social history.” [26] 


From here Bron continues to reformulate the problem, and to describe how metalogic 
allows us to address it. 

(14) the goals of metalogic are to delimit problems and to explore how elements 
in the significance space interpenetrate each other 

(15) metalogical delineation of significance space means examining specific 
human utterances or texts (syntax vectors) to dismiss some areas from 
consideration 

(16) the delimited area is then considered “metalogically valid” 

(17) to deny “meaningfully” that the Taj Mahal is not white does not imply, but 
suggests, that it is some color (and not, for example, “freedom,” “death,” 
“Halley’s comet,” or some other thing that is not relevant) 

(18) the topological representation of not-P can take any shape in the significance 
space, even contained within P (i.e. tangent to P at an infinite number of 
points — it this case it is said that it “shatters P”) 

(19)Summary: metalogic looks at cognitive activations triggered by linguistic 
parole (language as it is actually used) [24], selects a model of this in n- 
dimensional space, and looks at the interpenetration of truth values of 
relevant elements. Only in this context does it make (metalogical) sense to 
say that if the Taj Mahal is not white it is some other color, otherwise, the 
original premise is supported: denying the Taj Mahal is white is not to affirm 
that it is not white 


The remainder of Bron’s lecture merely focuses on mathematical techniques to model 
the significance spaces and industry protocols for doing so. So, stepping back to look 
at what Bron has just explained, meaning in a metalogical framework is embodied 
and triggered via discourse. Modeling meaning requires looking at both its cognitive 
basis and its relationship to language as used in practice. Mathematical modeling 
does not reify meaning, but it allows for precise statements to be made given an 
abstraction, and this abstraction may be fairly complicated with the added advantage 
that it can be modeled computationally in order to get closer to a precise account of 
the fuzzy topic of human meaning. According to Bron, regarding the issue of 
identity, the metalogical framework is shown to be much better than simplistic logical 
formalizations and their simplistic underlying assumptions. 


2.2 Goguen on Identity and Qualia 


Goguen is also engaged in the business of metalogic. His paper with Rod Burstall on 
the theory of institutions begins: 
There is a population explosion among the logical systems used in 
Computing Science. Examples include first order logic, equational 
logic, Horn clause logic, higher order logic, infinitary logic, 
dynamic logic, intuitionistic logic, order-sorted logic, and temporal 
logic; moreover, there is a tendency for each theorem prover to have 
its own idiosyncratic logical system. We introduce the concept of 
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institution to formalise the informal notation of "logical system. 

[18] 
He notes that some “exotic” logic systems have been proposed to handle various 
problems ranging from program construction to natural language. The theory of 
institutions allows comparison between various logics, translations between results in 
one logic and another, and an account of the fact that “many general results used in 
the applications are actually completely independent of what underlying logic is 
chosen.” The notion of an “institution” was introduced to “formalize the informal 
notion of ‘logical system’,” with the requirement that there be “a satisfaction relation 
between models and sentences which is consistent under change of notation.” Thus, 
the use of the prefix ‘meta’ in the case of Goguen and Burstall is traditional in that it 
abstracts to a higher level of generalization than model theory, which describes only 
the satisfaction relationship between syntax and semantics within a logical system. 
The theory of institutions allows logics themselves, many different vocabularies, to be 
compared. It is apparent that the theory of institutions is a rigorously formulated 
mathematical account with practical applications and wide theoretically implications. 
[18] 


In contrast, Delany’s notion of metalogic is not ‘meta’ in the traditional sense, rather 
it is ‘meta’ in a socio-cultural sense. It begins by looking at formal logical reasoning 
and its relationship to everyday human thought and problem solving. The ‘meta’ 
level from this perspective is the issue of how “logical” reasoning and representation 
in cognitive, social, and cultural contexts diverges from formal logical systems. 
Needless to say, Delany does not present this work as rigorous mathematics (it is 
embedded in a science fiction novel!) and his use of the concept of a “logic” though 
primarily presented mathematically, is also largely meant metaphorically, without 
clear indication of where the boundaries between these two functions lie. This is not 
troublesome, however, because as seen above in Section 2.1 Delany’s discussion of 
metalogic is multiveilant and is meant to comment upon the nuances of social identity 
relationships, to “ground” his novel (it is necessary in genre fiction to “mark” itself as 
conforming to conventions of the genre — in science fiction this is often done with 
detailed reference to mathematics and science) by postulating a well-thought out 
futuristic system of thought, and probably to explore some of his own thoughts as a 
philosopher and theoretician within the context of a fiction. 


Goguen’s work does address many overlapping issues with raised in Delany’s account 
of metalogic, but rather than being found in Goguen’s work on “metalogical” 
concerns (institutions), it can be found in his work on qualia and algebraic semiotics. 
In his paper “Time, Structure and Emotion in Music” [19], with Ryoko Goguen, it is 
stated that: 

In formal logic the Law of Identity is stated as “A = A” meaning 

that every object is equal (or identical) to itself...The Law of 

Identity may apply to objects of modern science or technology (e.g. 

numbers), but not to human experience. It appears that human 

senses have been optimized by evolution to find differences, in 

which case identity is the failure to find a significant difference. 
This formulation of identity with regard to human experience also can provide 
commentary on sociological phenomena of identity such as prejudice, or even 
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politically topical issues such as racial profiling and gender discrimination. It 
positions these practices as grounded in failures of sensory perception to account for 
differences (physical or cultural, nuanced or overt) between individuals that 
undoubtedly exist (as attested to by victims of systematic discrimination or profiling!) 
and implicitly states that such practices are the results of failures to respect the 
individuality of humans (instead relying upon inadequate and coarse systems of 
generalization and classification). Furthermore, Goguen emphasizes that it is not only 
truth values of concepts that are important, but qualitative experience in human 
existence. Thus, Goguen is concerned with qualia, often described informally in 
philosophy as “what remains when all objective features are subtracted.” [19] 
Goguen would remark, however, that in lived experience subjective phenomena are 
often attributed at least as much “reality” as so-called “objective” phenomena. 


Informal empirical experimentation and phenomenological analysis have moved 
Goguen to propose a different definition of qualia that avoids some of the vagueness 
of the traditional definition above. Goguen’s definition is: “Qualia are the 
hierarchically organized constituents of conscious experience, each with a saliency 
and an emotional tone.” To demonstrate qualia phenomena, he and Ryoko Goguen 
performed several musical experiments that yielded observations such as the 
following [19]: 
(1) added notes beneath a note can change the character of a top note 
(2) what comes before a note can greatly change its feeling 
(3) what comes after a note can greatly change its feeling 
(4) the apparent duration of a note can be changed by what comes before it 
(5) repetitive phrases are expected to take a role in a larger framework, are 
grouped, and with extreme repetition can become seen as background noise 
and ignored 
(6) anote can appear many times in a piece of music, but will not be interpreted 
merely as many instances of that note (the music is interpreted more 
holistically) 
Clearly, though the subject matter is music, these experiments offer a strong 
commentary on the transitory and subjective nature of identity. It is easy to think of 
parallels with social identity such as: prejudices can influence dispositions from an 
individual toward another individual (quale 1 above), impressions of a person after 
meeting him or her can alter dispositions toward that person (quale 2 above), or the 
process of enculturation within a group can allow a shift from ignorance of social 
protocol to full fluency with social protocol, so that interaction becomes automatic 
(quale 5 above). While Goguen does not present such social experiments in his paper, 
probably introspection will allow the reader of this paper to agree with these 
phenomena. In fact, these phenomena are commonplace and not surprising at all. 
What is striking is that such everyday observations seem to illuminate inadequacies of 
common approaches to identity (prejudice and discrete classification), the limitations 
of “objectifying” identity, and the philosophically oft-overlooked importance of 
subjective experience and emotion when accounting for identity. 


Since subjectivity phenomena rarely, if ever, occur in isolation, Goguen is also 
concemed with accounting for how qualia combine. He grounds this account in 
Gilles Fauconnier and Mark Turner’s theory of conceptual blending from cognitive 
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linguistics (along with Goguen’s hierarchical information theory). Goguen and 
Goguen describe conceptual blending as the process 
. in which relatively small, transient structures called conceptual 

spaces, combine or “blend” to yield a new space that may have 

emergent structure. Simple examples are words like "houseboat" 

and “roadkill,” and phrases like “artificial life” and “computer 

virus.” Blending is considered a basic human cognitive operation, 

invisible and effortless, but pervasive and fundamental, for example 

in grammar, reasoning, and combinations of text with music. [19] 
Important here is the fact that conceptual blending theory has an embodied basis as 
discussed above in 2.1. Furthermore, Goguen has developed a theory of algebraic 
semiotics that uses algebraic specification from computer science to provide formal 
notation to describe sign systems and mappings between them that are capable of 
representing conceptual blends. Goguen and I have developed an algorithm that 
models some core aspects of conceptual blending theory. [20], [21] This means that 
despite the subjective nature of qualia, and the qualitative nature of identity, at least 
some aspects of these phenomena can be approached formally with the use of 
mathematics. Though Goguen is careful to claim that such work is not intended to 
reify the formal models (in parallel with Delany), it is clear that he seeks an account 
of qualia and identity that is precise and rigorous, and that corresponds with the daily 
realities of lived human experience. 


3 Discussion 


3.1 Goguen’s Models and Realities 


Goguen and Delany both seek rigorous accounts of social issues, and both take 
inspiration and ideas from logic and mathematics. Both also exhibit a tension in their 
work between a desire to account for social phenomena as carefully as possible, as 
enabled through construction of intricate models, and to acknowledge the inherent 
limitations of such approaches. In a very broad sense perhaps they are trying to 
reconcile the power of holistic accounts provided by structuralism with deeply felt 
postmodernist understandings of the inadequacies of such global models. The desire 
for rigorous modeling is exhibited as both authors offer semiotic foundations for their 
work. 


In Goguen’s algebraic semiotics the structure of complex signs, including signs in 
diverse media, and the blending of such structures are described using semiotic 
systems (also called sign systems) and semiotic morphisms. A sign system consists 
of [21]: 

a loose algebraic theory composed of type declarations (called 

sorts) and operation declarations, usually including axioms and 

some constants), plus a level ordering on sorts (having a maximum 

element called the top sort) and a priority ordering on the 

constituents at each level. Loose sorts classify the parts of signs, 

while data sorts classify the values of attributes of signs (e.g., color 
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and size). Signs of a certain sort are represented by terms of that 
sort, including but not limited to constants. Among the operations 
in the signature, some are constructors, which build new signs 
from given sign parts as inputs. Levels express the whole-part 
hierarchy of complex signs, whereas priorities express the relative 
importance of constructors and their arguments; social issues play 
an important role in determining these orderings. Conceptual spaces 
are the special case where there are no operations except those 
representing constants and relations, and there is only one sort. 
Many details omitted here appear in [11]. 


A semiotic morphism is a mapping between sign systems. One very useful type of 
mapping discussed above is that between information and a representation of that 
information. A semiotic morphism maps sorts, constructors, predicates and functions 
of one sign system to sorts, constructors, predicates and functions of another sign 
system respectively. An example of how a sign system can be represented differently 
via different semiotic morphisms is presented in Figure 1 [11], which depicts 
representations of time as reported by different types of clocks. 


A Strange “Unary” Clock 





795 
A Naive Digital Clock 

















13 | 15 
A Military Time Clock 








Fig. 1. Different representations of a clock 


Goguen’s diagram depicts a unary clock that simply displays a character repeated a 
number of times equal to the number of elapsed minutes in a day, a simple digital 
clock that simply displays the same number of minutes in standard Arabic numerals, 
and a clock that displays military time. Semiotic morphisms from multiple conceptual 
spaces to a single conceptual space constitute a “blend.” 


Using a basis in conceptual blending theory and algebraic semiotics, Goguen and I 
have also provided an account of “style,” another subjective and seemingly 
unformalizable topic. Still, we made modest claims that some notions of style can be 
captured by the principles by which concepts and signs are blended, though this is not 
to be seen as analogous to true, context dependent, qualitative human style. In [20], 
we proposed two dimensions of style (regarding computer mediated texts): 
(1) Construction of formal narrative (or other) elements of media structure, at 
different levels of granularity. Ata large grain level these elements could be 
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narrative clauses, or scenes of a film, at a more fine grain they could be 
syntactic parameters of clauses, prosody of poems, or types of shots of a 
film, and at the smallest grain they could include character sprites or 
collectible items in games, specific metaphors in poems, or icons used in a 
user interface. 

(2) Selection of media and genres, selection of content, principles for how 
content elements can be combined, and controls for changing between media 
and genres. 

Later, we even offer the following bold statement (though we mitigate both of these 
claims later): 
Thus there are at least 12 dimensions of style in this approach, 4 at 
each level: choice of domaint, content of domain, optimality 
principles for blending, and controls for changing domains. [20] 
The point here is not the particularities of this notion of style, but rather the desire for 
the “cake” of a formal model of style, while being “able to eat” the facts that we do 
not reify this formalization and we do realize its limitations. 


Indeed, in another paper we make this value very explicit [21]: 

Before briefly discussing algebraic semiotics, it may be helpful to 
be clear about its philosophical orientation. The reason for taking 
special case with this is that, in Western culture, mathematical 
formalisms are often given a status beyond what they deserve. For 
example, Euclid wrote, “The laws of nature are but the 
mathematical thoughts of God.” ... Somewhat less grandly, one 
might consider that conceptual spaces are somehow directly 
instantiated in the brain. However, the point of view of this paper is 
that such formalisms are constructed by researchers in the course of 
particular investigations, having the heuristic purpose of facilitating 
consideration of certain issues in that investigation. 


Under this view, all theories are situated social entities, 
mathematical theories no less than others. 
The varyingly humble and enthusiastic claims concerning the nature, and concrete 
applications, of algebraic semiotics illuminate what I assert is a rare attitude toward 
the integration of mathematics and social concern. 


3.2 Delany’s Models and Realities 


A rare attitude, but not unique. Delany’s “Informal Remarks Towards the Modular 
Calculus” display a similar impulse. Part one of the “remarks” consists of the body of 
the novel Trouble on Triton itself; other parts of the “remarks” are strewn throughout 
other novels Delany has written in a completely different genre. Thus, the literary 
theorist Robert Elliot Fox describes Delany’s “modular calculus” as a “mapping of 
culture” that “embraces both science fiction and fantasy, as well as 
critical/confessional modes.” [10] Using the vehicle of Ashima Slade’s Harbin-y 


4 A “domain” here refers to a collection of knowledge regarding a particular idea or theme. 
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Lectures, Delany provides part two of his “informal remarks toward the modular 
calculus [3],” discussed below. 


The character Ashima Slade uses the sentence “The hammer hit a nail” to provide an 
example of some core concepts of the modular calculus. In summing up the modeling 
accomplished by that sentence Slade offers: 

We are modeling attitudes, objects, and various aspects of a relation 

between them; to do this job, we are using, among a large group of 

things and relations, various of those things and relations to stand 

for the objects, attitudes, and relations we wish to model. 
Slade continues to explain that there are various ways to express the grammatical and 
semantic relationships evident in the sentence, and likewise there are various ways to 
describe the relationship between, for instance, “the three a’s in the sentence.” If the 
sentence is thought to be formed of only letters and spaces, the ways to describe the 
relationships that make up and describe the sentence are limited. Slade posits that if 
the letters in the sentence were instead made of lines in a matrix on a digital display 


ZZZ 
ISISINISISIS 


Fig. 2. Digital display flash-out from Delany’s Trouble on Triton 


(see Figure 2), the ways of describing a list of relations in the sentence would be quite 
different, especially considering that letters can be made in multiple forms (see Figure 
3). 


3 


Fig. 3. Digital letter forms from Delany’s Trouble on Triton 


In explicating the modular calculus*, Slade distinguishes between modular and non- 
modular descriptions. A modular description “preserves some of the modular 
properties of the sentence in a list that describes the sentence.” A non-modular 
description “preserves none of the modular relations of the sentence in a list that 
describes the sentence.” Thus, Slade asserts that the digital display is modular 
whereas mere letters and spaces are nonmodular. The modular calculus, then, 
translates between a grammar (a list of sentences about how to compose sentences — 
an inherently nonmodular description even if it is complete), and a modular 
description. Slade concludes with the following remarks about the modular calculus: 

Now the advantages of a modular description of either a modeling 

object, like a sentence, or a modeling process, like a language, are 


5 And distinguishing it from the “modular algebra,” which sadly Delany does not have Slade 
explain in depth in the same essay. 
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obvious vis-a-vis a nonmodular description. A modular description 
allows us reference routes back to the elements in the situation 
which is being modeled. A nonmodular description is nonmodular 
precisely because, complete or incomplete as it may be, it destroys 
those reference routes: it is, in effect, a cipher. 


The problem that still remains to the calculus, despite my work, and 
that will be discussed in later lectures, is the generation of formal 
algorithms for distinguishing incoherent modular descriptive 
systems from coherent modular descriptive systems. Indeed, the 
calculus has already given us partial descriptions of many such 
algorithms, as well as generating ones for determining 
completeness, partiality, coherence, and incoherence—processes 
which till now had to be considered, as in literature, matters of taste. 


The parallel between the two authors’ ideas described above goes far beyond the fact 
that both use figures depicting digital displays, Goguen and Delany share a concern 
for the various ways to represent a particular sign system, and the fact (following 
Saussure) that “signs come in systems.” [11] Both also are interested in mapping the 
complex ways that sign systems are composed. But recall that Ashima Slade is 
naught but a character in Delany’s “Informal Remarks Towards a Modular Calculus,” 
and that the informal remarks are written in the fictional mode. Slade’s remarks and 
their mathematical timbre serve a metaphorical purpose (though their contents also 
express and reinforce that purpose) which is to express the complexities of meaning 
and identity formations (at the very least Delany raises many other social and 
philosophical issues) with fiction rather than formal modeling and the epistemological 
problems formalisms present. This decision to employ a fictional mode provides an 
advantage outlined observation of his other character, Bron Helstrom: “Ordinary, 
informal, nonrigorous language overcomes all these problems, however, with a 
bravura, panache and elegance that leave the formal logician panting and applauding.” 


Like Goguen does with algebraic semiotics, Delany mitigates the modular calculus. 
Slade’s fictitious biographer informs us that the modular calculus grew out of Slade’s 
earlier work in metalogic. But Bron Helstrom’s lecture on metalogic was completely 
undermined by his unsympathetic persona. Bron is a pompous “white” male who 
speaks with dominant cultural authority and in fact is filled with insecurities. At one 
point he angrily berates a worker on the telephone (or some futuristic version of a 
telephone) whose department had mistakenly placed Miriamne, a cybralogician, in the 
metalogics division. It becomes clear that Bron’s performance is only displayed in 
the hopes of impressing Miriamne (Bron continues pretending to yell at the worker 
even after he is hung up on). He exhibits an inability to relate to the woman in front 
of him, and is completely bewildered by his own identity, revealing the limited utility 
of his ability to pontificate on the subtly nuanced metalogical identity of the Taj 
Mahal. And in the end, the discussion of formally modeling the color of the Taj 
Mahal faded out in the face of lived reality as Bron’s lecture veered toward “muzzy 
eloquence”: “...the thought struck: Somewhere in real space was the real Taj Mahal. 
He had never seen it: He had never been to Earth.” 
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And the discussion of metalogic itself flashes out as Miriamne changes the subject to 
mention that earlier she had run into a female acquaintance that Bron was interested 
in. “What happened next was that his heart began to pound.” 


4 Conclusions 


Composing this paper has been a satisfying exercise that brought into conjunction the 
works of two people whom I admire a great deal. This process raised important 
issues about topics as diverse as social identity, qualia, semiotics, and consciousness, 
but perhaps as importantly, an unifying aesthetic was formed. Both authors offer a 
type of groundless [12] work with audacity in approaching “big” issues of life. In 
order to locate the ambiguities and consistencies of representation and meaning, 
Delany and Goguen each use a diving rod that bifurcates in two seemingly opposite 
directions: (1) a desire to rigorously map and exploit regularities of the world(s) we 
inhabit, and (2) a desire to acknowledge and embrace the ambiguities of lived human 
experience and its divergence from any idealized theory. The feelings, sometimes 
tension, sometimes cool detachment, most times deep compassion, the authors evoke 
come in part from the subject matters of their inquiries, and in part from their methods 
and discourse strategies used in their explorations, meditations. I conclude with a few 
remarks on a final parallel between the two authors. 


Delany, in a pair of quotations above, through the characters Bron Helstrom and 
Ashima Slade, expressed the “bravura, panache, and elegance” of informal language, 
and the ability of literature to formulate the modular calculus. Goguen, though 
cognizant of the limitations of formal methods, writes that his early formal 
mathematical work “may have an austere kind of beauty from its abstraction and 
generality,” and coined the metaphor of “Tossing Algebraic Flowers Down the Great 
Divide” to describe his life’s work in a biographical paper [14]. In the end, Goguen 
and Delany exhibit aesthetically motivated craftsmanship in their work. They both 
utilize a range of discourse styles, indeed all three that are exhibited in the fictitious 
work of Ashima Slade which are, once again: (1) rational argumentation, (2) logic and 
mathematics, and (3) more esoteric, artistic, or even religious/spiritual discourse. 
Samuel R. Delany’s three modes can be exemplified in: 
(1) the genre of critical fiction as in part two of “The Informal Remarks Towards 
the Modular Calculus, 
(2) the exposition of metalogic, 
(3) and contrasting descriptions of subcultures, both self-indulgent: 
Really, breast-bangles on a man? (even a very 
young man.) Just aesthetically: weren’t breast 
bangles more or less predicated on breasts that, a) 
protruded and, b) bobbed...?” [4], 
and acetic: 
Seven years ago, he’d actually attended a meeting 
of the Poor Children of the Avestal Light and 
Changing Secret Name; over three instruction 
sessions he’d learned the first of the Nintey-Seven 
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Sayable mantras/mumbles: Mimimomomizo- 
lalilamialomuelamironoriminos... [4] 
along with a lyrical beauty, now sparse, now dense, in his prose style. 


Joseph A. Goguen’s three modes can be exemplified in: 
(1) his philosophical discussion of qualia with some grounding in the work of 
Martin Heidegger and Edmund Husserl [16] 
(2) a great deal of his work in mathematics, a mild example is the introduction 
of the notion of an institution: 
..an institution consists of an abstract category 
Sign, the objects of which are signatures, a 
functor Sen: Sign — Set, and a contravariant 
functor Mod: Sign — Setop (more technically, we 
might uses classes instead of sets here). 
Satisfaction is then a parameterized relation |= S 
between Mod(S) and Sen(S), such that the 
following satisfaction condition holds, for any 
signature morphism f: S — S', any S-model M, 
and any S'-sentence e: 
M|=S f(e) iff f(M) |= S'e 
This condition expresses the invariance of truth 
under change of notation. [18] 
(3) his Buddhism based explorations of phenomenological and even 
metaphysical concerns: 
However, if Heidegger and the Buddhists are 
right, it is the possibility of non-being which 
gives beings their character of luminosity, and 
hence the nothing, i.e., shunyata, is not only prior 
to negation, but also to things. 


The effect of this, as Heidegger says, is to rob 
logic of its claim to supremacy, and in particular, 
to rob it of its claim to provide foundations for 
science and even for mathematics. Indeed, we 
must conclude that foundations in the sense 
sought by logicians are simply not possible. The 
judgements that we make, and in particular any 
negative judgements, are necessarily grounded in 
our being-in-the-world, and not in any pre- 
existing unshakable truths, or eternal world of 
ideal things. [17] 
And finally his poetry: 
6:41 am 


Clear leaf cloud masses 
motionlessly moving 

past the static gray road - 
almost too lovely to bear. [13] 


48 D. Fox Harrell 


Acknowledgments 


Joseph Goguen made this paper possible with the gift of his work. When I was 
seeking a Ph.D. program, his algebraic semiotics inspired me to cross the United 
States of America, to move back to the city where I was raised, to work with him on 
combining twin streams of computation and art (especially narrative art). In his 
Meaning and Computation Lab, his advisorship, and his friendship, I found what I 
was seeking. 


Samuel “Chip” Delany I have only met in passing moments, as a fan. In New York 
City he graciously provided me his address so that I could mail him a correspondence 
regarding one of his stories — my favorite short story in existence: “The Tale of 
Rumor and Desire” — I never could find the right words to write him. In San Diego, 
he offered bit of encouragement on publishing my novel. I thank Delany for forging a 
trail in the combination of fantasy and sociology that is my passion. 


References 


1. Bowker, G. C., Star, S. L.: Sorting Things Out: Classification and Its Consequences. The 
MIT Press, Cambridge, MA, (1999) 


2. Collins, F. S.: What we do and don't know about ‘race,’ ‘ethnicity,’ genetics and health at 
the dawn of the genome era . In: Nature Genetics 36, S13 - S15, National Human Genome 
Research Institute, National Institutes of Health, Bethesda, Maryland, (2004) 

3. Delany, S. R.: Trouble on Triton: An Ambiguous Heterotopia. Wesleyan University Press, 
Hanover, NH, (1976) 

4. Ibid., 2 

5. Ibid., 48 

6. Ibid., 48-55 

7. Ibid., 51 

8. Ibid., 295-296 

9. Ellington, Duke. The Duke Ellington Carnegie Hall Concerts: January 1943, Berkeley: 


Prestige, (1977). 

10. Fox, R. E.: Conscientious Sorcerers: The Black Postmodernist Fiction of LeRoi Jones/ 
Amiri Baraka, Ishmael Reed, and Samuel R. Delany. Greenwood Press, New York, (1987) 

11. Goguen, J.: An Introduction to Algebraic Semiotics, with Application to User Interface 
Design, In: Proceedings, Computation for Metaphors, Analogy and Agents, edited by 
Chrystopher Nehaniv. Yakamtsu, Japan, (1998) 

12. Goguen, J.: Consciousness and the Decline of Cognitivism. In: Advance Papers, Second 
Workshop on Distributed Collective Practice. University of California, San Diego, (2002) 

13. Goguen, J.: November Qualia, URL = 
<http://www.cs.ucsd.edu/users/goguen/misc/novq.html>. 

14. Goguen, J.: Tossing Algebraic Flowers Down the Great Divide. In: C. S. Calude, (ed.): 
People and Ideas in Theoretical Computer Science, Springer, New York, (1999) 

15. Ibid., 1 

16. Goguen, J.A.: Musical Qualia, Context, Time and Emotion. In: J.A. Goguen and E. Myin 
(eds.): Journal of Consciousness Studies, Volume 11, No. 3-4, March-April, Imprint 
Academic, (2004) 


Metalogic, Qualia, and Identity on Neptune's Great Moon 49 


17. Goguen, J. A.: Truth and Meaning. In: Four Pieces on Error, Truth and Reality, Technical 
Monograph PRG-89, Oxford University Computing Laboratory Programming Research 
Group, Oxford, (1990) 

18. Goguen, J., Burstall, R.: Introducing Institutions. In: Logics of Programs (Carnegie- 
Mellon University, June 1983), Lecture Notes in Computer Science, Volume 164, Springer, 
(1984) 221-256 

19. Goguen, J., Goguen, R.: Time, Structure and Emotion in Music, Japanese translation by 
Sumi Adachi to appear in book of University Lectures at Keio University, (2003-2004) 

20. Goguen, J., Harrell, D. F.: Foundations for Active Multimedia Narrative: Semiotic Spaces 
and Structural Blending. In revision, (2006) 

21. Goguen, J., Harrell, D. F.: Style as Choice of Blending Principles. In: Style and Meaning in 
Language, Art, Music and Design, Proceedings of a Symposium at the 2004 AAAI Fall 
Symposium Series, Technical Report FS-04-07, AAAI Press, Washington DC, October 21- 
24, (2004) 

22. Jahn, M.: Narratology: A Guide to the Theory of Narrative, URL = 
<http://www.uni-koeln.de/~ame02/pppn.htm>, N2.3.1. 

23. Saar, Betye. Colored: Consider the Rainbow, Michael Rosenfeld Gallery, New York, 

(2003) 

24. Saussure, F.: Course in General Linguistics. Translated by Roy Harris. Duckworth, 
London, (1976) 

25. Tye, M.: Qualia, The Stanford Encyclopedia of Philosophy (Summer 2003 Edition), 
Edward N. Zalta (ed.), URL = <http://plato.stanford.edu/archives/sum2003/entries/qualia/>. 

26. Varela, F. J., Thompson, E., Rosch, E.: The embodied mind: Cognitive science and human 
experience. The MIT Press, Cambridge, MA, (1991) 


Quantum Institutions 


Carlos Caleiro, Paulo Mateus, Amilcar Sernadas, and Cristina Sernadas 


CLC, Department of Mathematics, IST, 
Av. Rovisco Pais, 1000-149 Lisbon, Portugal 


Abstract. The exogenous approach to enriching any given base logic 
for probabilistic and quantum reasoning is brought into the realm of 
institutions. The theory of institutions helps in capturing the precise 
relationships between the logics that are obtained, and, furthermore, 
helps in analyzing some of the key design decisions and opens the way 
to make the approach more useful and, at the same time, more abstract. 


1 Introduction 


A new logic was proposed in [I] [2] [B] for modeling and reasoning about quantum 
states, embodying the relevant postulates of quantum physics (as presented, for 
instance, in [4]) and adopting the exogenous approach (the original models are 
kept). The logic was designed from the semantics upwards, starting with the key 
idea of adopting superpositions of classical models as the models of the quantum 
logic. In [5], other instances of the exogenous approach to enriching logics were 
presented in detail. In short, the exogenous approach is based on adopting as 
models of the new envisaged logic (enriched) sets of models of the given base 
logic without tampering with the models of the original logic. As an example 
assume that we want to introduce probabilities to a certain logic. Doing so, using 
the exogenous approach, means that we consider the possible outcomes to be the 
semantic structures and we assign probabilities to sets of such structures. 

This novel approach to quantum logic semantics is completely different from 
the traditional approach [6] [7] to the problem, as initially proposed by Birkhoff 
and von Neumann [8], that focuses on the lattice of closed subspaces of a Hilbert 
space. The main drawback of Birkhoff and von Neumann’s approach is that it 
does not yield an extension of classical logic. Our semantics has the advantage 
of closely guiding the design of the language around the underlying concepts 
of quantum physics while keeping the classical connectives and was inspired by 
the Kripke semantics for modal logic. The possible worlds approach was also 
used in [9] [10] £1] [2] (13) for probabilistic logic. Our semantics to quantum logic, 
although inspired by modal logic, is also completely different from the alternative 
Kripke semantics given to traditional quantum logics (as first proposed in [14]) 
still closely related to the lattice-based operations. The resulting quantum logic 
also incorporates probabilistic reasoning (in the style of Nilsson’s calculus [9}[10}) 
since the postulates of quantum physics impose uncertainty on the outcome of 
measurements. From a quantum state (superposition of classical valuations living 
in a suitable Hilbert space) it is straightforward to generate a probability space 
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of classical valuations in order to provide the semantics for reasoning about the 
probabilistic measurements made on that state. 

Herein, we present within the theory of institutions (a logic is identified with 
an institution, as originally proposed in [15] [{6]), the exogenous-style construc- 
tion of a quantum logic from any given base logic in order to assess how general 
the construction is. The construction is carried out in three main steps. Given 
an arbitrary institution we first build its global extension (globalization) where 
each model is just a set of models of the original institution. Then, we proceed 
with the construction of its probabilistic extension (probabilization) where each 
model is a probability space where the outcomes are models of the original in- 
stitution. Finally, we obtain the quantum extension (quantization) of the given 
institution where each model is a unit vector in the Hilbert space freely gen- 
erated from a set of models of the original institution. Obviously, in each step 
the language is enriched to take advantage and to express properties of the new 
models. For instance, in the globalization step, global classical connectives are 
added for reasoning about formulas of the original logic. The institutional per- 
spective allows us to conclude that the first two constructions are fully general, 
in the sense that nothing is assumed about the given institution and also that 
nothing else is needed. But quantization requires some additional information 
(the choice of qubit formulae). 

In Section 2, we briefly present the relevant notions and results of the theory 
of institutions. The globalization step is described in Section 3. The probabi- 
lization step is presented in Section 4. Finally, in Section 5 we carry out the 
quantization step of the enrichment. We conclude with an outline of further 
research directions. 


2 Institutional Preliminaries 


In this paper, as a first step towards the full understanding of the proposed 
approach to enriching logics, we shall adopt a variant of the original notion of 
institution, without morphisms between models (c.f. [I7]). For simplicity we shall 
just call it an institution, without any further qualifiers. We denote by Cls the 
category with classes as objects and maps between classes as morphisms. 

An institution is a tuple I = (Sig, Sen, Mod, |+) where: Sig is a category 
(of signatures); Sen : Sig — Set is a (formula) functor; Mod : Sig > Cls°P 
is a (model) functor; and IF= {lF 5} ye|sig| is a family of (satisfaction) relations 
lksC Mod() x Sen(’), such that the following satisfaction condition holds, 
for every signature morphism o : X — &”, every formula y € Sen(2), and every 
model m’ € Mod(”): Mod(c)(m’) IFs y iff m IF x, Sen(o)(y). 

As usual, given a set I C Sen(2) of formulas and a model m € Mod(X), 
we will write m lF» I to denote the fact that m lF 5 ọ for every y € I’. Mutatis 
mutandis, given a set M C Mod() of models and a formula y € Sen(), 
we will write M lFs » to denote the fact that m IFs p for every m € M. 
Recall that I induces a family F= {Fy} sejsig) of (entailment) relations F xC 
Pw/(Sen(’)) x Sen(2’) defined by [Fy y if, for every m € Mod(2), if mlks 
I then mlFy g. 
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The notions of arrow between institutions are at least as important as the 
notion of institution itself. There is a rather extensive and prolific bibliography 
on this subject, where various meaningful notions of arrows between institutions 
are proposed, used, exemplified, and related with each other. A recent system- 
atization of the field can be found in [I7]. The notion of arrow that we will be 
using in this paper can be classified as a comorphism (or a plain map as origi- 
nally named in [I8], or also a representation as renamed in [19]). It is however 
a modified comorphism that maps models to sets of models, which can be ex- 
plained as an instance of the general monad construction of [20]. The definition 
will take advantage of the usual covariant powerset endofunctor Pw, in this case 
extended to classes, that is, Pw : Cls — Cls is such that Pw(X) = 2*, and 
Pw(f : X — X’) maps each Y C X to f[Y] = {f(£):xEe Y}. 


Definition 1. A power-model comorphism from institution I to institution I’ 
is a tuple (®,a, 3) where @ : Sig — Sig’ is a (signature translation) functor; 
a: Sen — Sen’ o @ is a (formula translation) natural transformation; and 
B : Mod'oð — PwoMod is a (power-model translation) natural transformation, 
such that the following coherence condition holds, for every signature X € |Sig|, 
formula y € Sen(X), and model m € Mod’(#(Z)): Bs(m’) IFs y iff m lkas) 


as(¢). 


In the definition above, Gxy(m’) is a set of models. Thus, the coherence con- 
dition states that m’ l-as) ay(y) iff, for every m € Bs(m'), m lFs p. Clearly, 
the possibility that Gx(m’) = Ø is not excluded. In that case, m’ must satisfy 
the translation via a of any formula whatsoever. A particularly interesting case 
corresponds to the situation when 3y(m’) is a singleton. If this happens for ev- 
ery model then we can recast the power-model natural transformation simply to 
B : Mod’ o $ — Mod, thus obtaining the usual notion of comorphism. 

It is a well known fact that comorphisms preserve entailment. A further sim- 
ple condition on the surjectivity of the translation of models can also guarantee 
the reflection of entailment. Such properties were studied in [21]. These results 
can easily be lifted to the level of power-model comorphisms, as stated below. 
(Power-model) comorphisms compose in the usual way. 


Proposition 1. Let I and I’ be institutions and (®,a,3) : I > I' a power- 
model comorphism. Then I Fy p implies as[I’| Fas) ax(y). Additionally, if 
for each m € Mod() there exists m! € Mod (®(2)) such that Bs(m') = {m}, 
then I Es y iff asl] Fas) aslo). 

















Proof. Given the power-model comorphism, assume that [ Fs y. If m’ € 
Mod’ ((X)) is such that m’ Foc) as|I] then, using the coherence condition 
of the power-model comorphism, we have that Gs(m’) I-s IP. Thus, by defini- 
tion of entailment, it follows from [Fy ọ that By(m’) IFs p. Using again the 
coherence condition, we now get m’ lrg, ax(y). Hence, ay{I’] Facs) ashy). 
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Assume now that the additional surjectivity condition holds and ay{I’] Fe(s) 
asly). If m € Mod(X) is such that m lk, T then {m} lk, T. But we know that 
there exists m’ € Mod’(#(X)) such that Bs(m’) = {m}. Thus, Bs(m’) Ike T 
and it follows from the coherence condition of the power-model comorphism that 
m’ l-as) ay [I]. Hence, by definition of entailment, it follows that m’ TEN 
ay(y). Using again the coherence condition we obtain that Bs(m') lke p, or 
equivalently, m lk, y. Therefore, I Es y. > 


Hence, the existence of a power-model comorphism that fulfills the surjectiv- 
ity condition stated in the second half of Proposition[I] for every signature, allows 
one to say that the target institution is a conservative extension of the source 
institution. Note that, for comorphisms, the surjectivity condition stated above 
simply boils down to requiring that each map By : Mod’((X)) —> Mod(Z) 
is surjective. It is also a trivial task to check that the surjectivity condition is 
preserved by composing (power-model) comorphisms. 





3 Global Institution 


As a first step in our development, we aim at characterizing the exogenous 
enrichment of a given logic with a layer of global reasoning. For the purpose, let 
I = (Sig, Sen, Mod, IF) be the starting institution. We now proceed by defining 
the envisaged global institution [9 and then showing, by means of a power-model 
comorphism, that it extends J in a conservative way. 


Definition 2. The global institution I9 = (Sig, Sen’, Mod? ,I}9) based on T is 
defined as follows: 

— Sen?(X) is the least set containing Sen(%’) such that, if 6,61, 62 E€ Senf (X) 

then (E ô), (6; 462) E Sen? (X). 

— Sen?(o) = o” is defined inductively by: o(p) = Sen(c)(y), 09(H6) = 
a9(6)), and o9 (ô1 I ô2) = (o9 (61) I o9 (ô2)); 
— Mod?’ (£) = {M : 0# M C Mod(X)}, 
— Mod? (o)(M') = Mod(a)[M"; 
— IF{ is defined inductively by: M IF} y iff M lF» p, M I-$ (Hô) iff M I4 6, 

and M Ir¥, (5, 2 ô2) if M IZ, ô or M IF$ bp. 










































































Clearly, I9 is an institution. Indeed, the functoriality of Sen’ and Mod’ is 
straightforward. The satisfaction condition of I9 can be established by a simple 
induction on formulas. The only interesting case is the base case, that we analyze 
below, the other cases being immediate by induction hypotheses. Let o : X — 
+” be a signature morphism, y € Sen(X) and M’ € Mod*(”). Then, by 
definition of Sen’ and IF, M’ I-$, Sen? (o)(p) iff M’ I-s Sen(o)(p), that is, 
m' l-z Sen(o)(p) for every m’ € M”. Therefore, using the satisfaction condition 
of I, this is equivalent to having Mod(a)(m’) I-s y for every m’ € M’, that is, 
Mod? (o)(M') IF} o. 

In the resulting logic, the connectives H and 3 correspond to global negation 
and global implication, respectively. Other connectives can be easily introduced, 
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like global conjunction (6, N 62) = H(é, 3 (H ô2)). If the base institution has 
a negation — and an implication =, which can be understood as local, these 
connectives do not collapse with the global ones. For implication, for instance, 
we have that {(y1 = y2)} F% (pı I y2), but the converse does not hold in gen- 
eral, given two base formulas Y1, p2 E€ Sen(X). Namely, assume that J is the 
institution of classical propositional logic, 71,72 E X are two propositional sym- 
bols and v1,v2 E€ Mod(X) are two classical valuations such that vı(mı) = 0, 
v1(m2) = 0, ve(m1) = 1 and ve(m2) = 0. Then, {v1, v2} I4 (mı I T2) but 
{v1, v2} WE (m1 = m2). The logic resulting from globalizing classical proposi- 
tional logic was carefully studied in [5], where a sound and complete calculus 
could be obtained by capitalizing on a calculus for classical logic and adding an 
axiomatization of the new connectives. It is an open question if the same sort of 
enterprise can be done in the general case. However, it seems possible to gener- 
alize the technique used there, at least if the base logic enjoys an expressibility 
property analogous to the disjunctive normal form of classical logic. 

More interesting, at the moment, is to establish the precise relationship be- 
tween the institutions J and 79. 

















Proposition 2. The triple C9 = (®9, a9, 69), where ®9 is the identity functor 
on Sig; for each X, a¥, translates y € Sen(X) to y; and for each X, 6% translates 
M € Mod?’ (X) to M, is a power-model comorphism C9 : I — IY and fulfills the 
surjectivity condition. 


Proof. The naturality of a9 and p9 is straightforward. Given a signature mor- 
phism o : X — X” and y € Sen(Z), we have Sen? (o)(a% (p)) = Sen9(c)(y) = 
Sen(c)(y) = a%,(Sen(c)(y)). Similarly, given M’ € Mod’(2”), then we have 
Pw(Mod(c))(G%,(M’)) = Pw(Mod(c))(M’) = Mod(c)[M'|= Mod! (c)(M’) 
= B%(Mod*(c)(M’)). The coherence condition is trivial. > 


As a corollary, by Proposition [I] C9 shows that I9 is in fact a conservative 
extension of I. 


4 Probability Institution 


Let us now characterize the exogenous enrichment of a given logic with proba- 
bilistic reasoning. We start by introducing the essential definitions and properties 
of probability spaces. A probability space over a non-empty set 2 of outcomes 
is a pair P = (B, u) where B is a Borel field over Q, that is, B C 2° contains 
Q and is closed for complements and countable unions; and p : B — [0,1] isa 
measure with unitary mass, that is, u(M) = 1 and (UZ; Bi) = X2; (Bi) if 
{Bi}, C B is a family of pairwise disjoint sets. 

In due course, we will need to map probability spaces along functions on their 
outcomes. Let f : U — U’ be a function, 2 C U and P = (B, u) a probability 
space over (2. The image of P along f is the probability space f(P) = (B', w’) 
over Q’ = f[Q] where B’ = {B' CQ: f-1(B')NQ € B}; and p is such that 
W (B') = wf (BQ). 
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Let I = (Sig, Sen, Mod, lF) be the starting institution. As before, we shall 
first define the envisaged probability institution J? and then show, using power- 
model comorphisms, that J? extends conservatively both J and I9. Indeed, the 
whole idea is to work with sets of models of the original institution, as in the 
global case, but now endow them with a certain probability measure. Of course, 
also the linguistic resources of the logic will be augmented to allow probabilistic 
assertions and reasoning. For that sake, we assume fixed a set X of variables. 
We shall also denote by R the set of all computable real numbers (see [22]). 


Definition 3. The probability institution I? = (Sig, Sen”, Mod”, IHP) based on 


I is defined as follows: 
— Sen?(2’) is the least set containing Sen(»’) such that: (Hô), (61 I ô2 


Sen? (X) if 6,61, 62 E€ Sen” (X), and (tı < t2) E€ Sen” (X) if t1,t2 € P(X) 
where T” (X) is the least set (of probabilistic terms) such that: X, R C P(X), 
(fy) € P(X) if p € Sen(X), and (tı +t2), (t1.t2) € P(X) if ti, t2 € P(X) 

— Sen” (o) = o? is defined inductively by: oP” (p) = Sen(c)(y), o? (Hb) = 

(Bo? (8)), o?(d1 342) = (o? (81) I o? (82)), and o? (tı < ta) = (T (o) (t1) < 

T” (o)(t2)), where T” (ø) is inductively defined by: T” (o)(x) = x, T” (o)(r) = 

r Ta)( 6) = (fSen(o)(e)), To)(t +) = (1°(o)(ty) +T°(o)(t2)), and 

T(a)(ty t2) = (P(o) (t1).T°(o) (t2)); 

— Mod?(’) is the class of all triples S = (M, P, p) where M is non-empty 
set subset of Mod(), P = (B, p) is a probability space over M such that 
{m € M:m IFs p} € B for every y € Sen(X), and p : X — R is an 
assignment; 

- Mod? (o)((M', P', p’)) = (Mod(o)[M"], Mod(o)(P’), p’); 

— IF% is defined inductively by: S IF} y if M IFs y, for y € Sen(Z), S F5 

5) iff 3 VR, 5, S IF, (51282) iff S VB, 51 or S IPB dy, and S IFB (ty < ta) iff 

t1]? < [t2]°, where the denotation of probabilistic terms [_]* : (X) — R 

is defined inductively by: [x] = p(x), for x € X and [r]* = r, for r € R, 

Jel’ E ({m EM:m IF yh), [ts + t]* = [t.]° + [t2]° and [t1.t2]* = 
ta) [ta] 


IP is an institution. Indeed the functoriality of Sen” is straightforward. Con- 

cerning Mod?, and given o : X — X’, just note that indeed (M, P, p) = 
Mod?(o)((M", P’, p’)) E€ Mod? (X). Given y € Sen(’), {m € M:m IFs p} 
is measurable, because M = Mod(c)[M’], and it is also measurable the set 
Mod(c)"'({m € M : mlFs p} NAM = {m € M' : Mod(a)(m’) IFs p} = 
{m € M' : m IFs Sen(o)(p)}. 
The satisfaction condition of IP can be established by a simple induction on 
formulas. The only interesting case is that of inequalities. Let t € T” (X). For ease 
of notation let S” = (M’, P’, p’) and P’ = (B’, u’), S = Mod?(c)(S") = (M, P, p) 
and P = Mod(c)(P’) = (B, 1). We need to show that [t]% = [T?(c)(t)]* . This 
fact can be shown by a simple induction on terms. The interesting case concerns 
the terms (fy). Given p € Sen(2’), using the definitions of term denotation 
and of image of a probability space, the satisfaction condition of the base insti- 
tution J, and the definition of term translation, along with a little set-theoretical 
manipulation, we have that 
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[Sy]? = um EM : miks p}) = 

u (Mod(o)™t({m € M: mlFs y}) N M’) = 

b’({m' € M':Mod(o)(m) IFs y}) = 
pm E€ M' : m IFz Sen(o)(~)}) = [fSen(o)(y) 1° = [POSO]. 


The term (fp) denotes the probability of y, interpreted as the probability 
of the models of the base institution that satisfy y. The logic resulting from the 
probabilization of classical propositional logic was carefully studied in [5], where 
a sound and weak complete calculus could be obtained. The calculus extends the 
one for the globalization of classical propositional logic by exploring the inter- 
play between the classical connectives and probability, and uses an oracle rule for 
reasoning with real numbers. Although the logic enjoys the deduction theorem 
with respect to global implication, strong completeness is out of reach simply 
because the logic is not compact. Take, for instance, A = {(r < x): r < 4}. 
Clearly, A F$ (4 < x) but no finite subset of A does. Another interesting rele- 
vant remark is the fact that the operators O and ¢ defined by (Oy) = (1 < fy) 
and (Oy) = E(f < 0) behave as normal modalities. 












































In the general case depicted here, however, our aim is to establish the precise 
relationship between the institutions J, I9 and JP. 


Proposition 3. The triple C9? = (69, a9? , 39°), where PIP the identity func- 
tor on Sig; for each X, af? translates each 6 € Sen? (X) to 6; and for each X, 
BY translates each (M, P, p) € Mod” (X) to M € Mod? (X), is a comorphism 
and fulfills the surjectivity condition. 


Proof. The naturality of a9? and 89 and the coherence condition are straight- 
forward. As for surjectivity, given a non-empty M C Mod(X) and m € M, take 
for instance the triple S = (M, (B, p), p) where B = 2™, p(B) = and p is any 
assignment. Then (B, u) is a probability space over M, and 68 (S) = M. > 


As a corollary, by Proposition[Jand the observations therein, C9? shows that 
IP is a conservative extension of I9. By transitivity, I? is also a conservative ex- 
tension of J. Indeed, by composition, we also obtain a power-model comorphism 
CP = C o CI: I — IP that fulfills the surjectivity condition. 


5 Quantum Institution 


Finally, we turn our attention to the exogenous enrichment of a given logic with 
quantum reasoning. In order to materialize the key idea of adopting superposi- 
tions of models of the given logic as the models of the envisaged quantum logic, 
let us start by recalling the essential concepts of quantum systems. Let us re- 
call the relevant postulates of quantum physics (following closely [4]) and set up 
some important mathematical structures. 


Postulate 4 Associated to any isolated quantum system is a Hilbert space. The 
state of the system is described by a unit vector |w) in the Hilbert space. 
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For example, a quantum bit or qubit is associated to a Hilbert space of di- 
mension two: a state of a qubit is a vector ag|0) + ai|1) where ao, @ı € C and 
||? + |ay|? = 1. That is, the quantum state is a superposition of the two clas- 
sical states |0} and |1) of a classical bit. Therefore, from a logical point of view, 
representing the qubit by a propositional constant, a quantum valuation is a 
superposition of the two classical valuations. 


Postulate 5 The Hilbert space associated to a quantum system composed of 
finitely many independent component systems is the tensor product of the com- 
ponent Hilbert spaces. 


For instance, a system composed of two independent qubits is associated 
to a Hilbert space of dimension four: a state of such a system is a vector 
Q00|00) + ao1|01) + Q10|10) + ay1{11) where Q9, 10, 01, @11 € C and laoo]? + 
laoi]? + laio? + la11|? = 1. Again, representing the two qubits by two propo- 
sitional constants, a quantum valuation is a superposition of the four classical 
valuations. So, the Hilbert space of the system composed of two independent 
qubits is indeed the tensor product of the two Hilbert spaces, each correspond- 
ing to a single qubit. 


Since we want to work with an arbitrary set of qubits, we will need the 
following general construction. Given a nonempty set E, the free Hilbert space 
over E is H(E), the inner product space over C defined as follows: each ele- 
ment is a map |w) : E — C such that {e € E : |w)(e) # 0} is countable, 
and J` cpg ||w)(e)|?_ < œœ; addition, scalar multiplication and inner product 
are defined by |wi) + |we) = Ae.|w1)(e) + |we)(e), alw) = Ae. alw)(e), and 
(wi |W) = Veen |wi)(e)|we)(e). 

As usual, the inner product induces the norm |||w)|| = y (w|w), which on its 
turn induces the distance d(|w1),|w2)) = |||w1) — |we)||. Since H(E) is complete 
for this distance, H(E) is a Hilbert space. Clearly, {|e) : e € E} is an orthonor- 
mal basis of H(E), where |e)(e) = 1 and |e)(e’) = 0 for every e’ # e. A unit 
vector of H(E) is just a vector |w) € H(E) such that |||w)|| = 1. 


Let Q be the set of qubits in hand. If there are no dependencies between 
the qubits then the system is described by the Hilbert space H(2@), where 2° 
is the set of all classical valuations. However, in many cases, we will be given 
a finite partition S = {Q1,...,Qn} of Q, giving rise to n independent subsys- 
tems. In the sequel, we will use LJS to denote the set {Uger Qi: R C S} 
Moreover, it may also be that the qubits Q; of each isolated subsystem are also 
constrained and some of the classical valuations in 28: are impossible. Any set 
V C 2 of admissible classical valuations induces a set of admissible classi- 
cal valuations for each subsystem, that is, V; = {v; : v € V} with v; = vlq,. 
Analogously, we will use vg to denote the restriction v|z of a valuation v to 
R e U S, and Vr = {vr : v € V}. Then, the space describing the correspond- 
ing quantum system will be the tensor product Q; H(Vi). Still, note that 
although (2°); = 2% and 28 = [J] 2@é, in general V Ç [Ji Vi. Moreover, 
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although H(2°) = Qi H(2@) = Qi H(I 2%), in general we have that 
HV) S Biz HV) S HOT Vi) 


Hence, we should only consider quantum states of @/'_, H(V;) that are com- 
patible with V. Given the subspace relations stated above, we shall call a struc- 
tured quantum state over V and S to a family |w) = {|w;)}%_, such that each 
|wi) is a unit vector of H(V;); and (v|(@;_, |wi)) = I (vilwi) = 0 if v ¢ V. 

Note that it is easy to identify Q;—; |wi) with a unique unit vector in H(V) 
since all the amplitudes on valuations not in V are null. Hence, by abuse of 
notation, we shall also use |w) to denote Q; |wi). 


Now, we turn our attention to the postulates concerning measurements of 
physical quantities. 


Postulate 6 Every measurable physical quantity of an isolated quantum system 
is described by an observabl!] acting on its Hilbert space. 


Postulate 7 The possible outcomes of the measurement of a physical quantity 
are the eigenvalues of the corresponding observable. When the physical quantity 
is measured using observable A on a system in a state |w), the resulting outcomes 
are ruled by the probability space Probo) = (2, Bla, Hf) where in the case of 


a countable spectrum Hi = AB. Ð yep XB(A)|Pilw)]|? . 


For the applications we have in mind in quantum computation and informa- 
tion, only logical projective measurements are relevant. In general, the stochastic 
result of making a logical projective measurement of the system at a structured 
quantum state |w) determined as above is fully described by the probability 
space (2, ujw))} over V where pjw)(B) = yep |(v|w)|? for every BC 2”. 


In the sequel, we will need to be able to map quantum systems and states 
across qubit maps. Let f : U — U’, Q CU and @ = f[Q]. Then, the function 
f? : 22 — 22 defined by f*(v’)(q) = v'(f(q)) is injective: if f*(v,) = f*(v5) 
then, for each q € Q, vi(f(q)) = v4(f(q@)), which implies that vi = v4 since 
Q’ = f[Q]. Hence, f° establishes a bijection between any given set of classical 
valuations V’ C 2° and V = f*[V"] C 2%. Therefore, f° also establishes an 
isomorphism between the Hilbert spaces H(V’) and H(V) obtained my map- 
ping |w’) E€ H(V’) to |w) = f*(\w’)) such that |w)(f*(v’)) = |w’)(v’). More- 
over, note that every finite partition S’ = {Q‘,...,Q/,} induces a partition 
S = f-[S'] = {Q1,..., Qn} of Q with each Q; = f—1(Qi) N Q. Hence, since 
surjectivity guarantees that each Q; = f[Qi], the Hilbert space isomorphism es- 
tablished in the preceding paragraph by f° also applies to the subsystems, that 
is, H(V/) and H(V;) are isomorphic. 


(3 


1 Recall that an observable is a Hermitian operator such that the direct sum of its 
eigensubspaces coincides with the underlying Hilbert space. Since the operator is 
Hermitian, its spectrum 2 (the set of its eigenvalues) is a subset of R. For each 
A € Q, we denote the corresponding eigensubspace by E) and the projector onto Ey 
by P. X: 
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We now characterize the exogenous enrichment of a given institution J with 
quantum reasoning. As in the previous cases, we shall first define the envisaged 
quantum institution 7 and then characterize its relationship to I, as well as to 
the institutions previously built. To this end, qubits will be selected formulas 
of the original logic, that induce upon observation a probability distribution on 
models of the original institution. The notation J? is a little abusive here, since 
the enrichment will be parameterized by a functor that chooses the qubits of 
interest. Hence, we consider fixed a functor Qb : Sig — Set such that, for every 
signature X, Qb(X) C Sen(X) and, for every signature morphism o : X > X”, 
Qb(c) = Sen(o)|q»(s) and Sen(o)[Qb(X)] = Qb(2”). Note that Sen(c) is 
required to be surjective on qubits, and that this requirement is essential in the 
subsequent development of the I4 institution. 

Clearly, models of the given institution induce classical valuations on the 
qubits. We denote by Vs : Mod() — 22>) defined, for each qubit y € 
Qb(2), by 





Lif mlFy yp 
0 otherwise ` 


vsliny(e) = { 


To fulfill the original idea of working with quantum superpositions of models 
of the original institution, we will have to restrict our attention to sets of models 
M C Mod() on which Vy is injective, that is, if m1,m2 E€ M and mı 4 m2 
then Vx (m1) 4 Vs(mə2). In this way, we have a bijection between M and Vy[M]. 

Given A C F C Qb(2), we shall denote by vå € 2” the classical valuation 
of the qubits in F defined by v4{(y) is 1 if y € A and is 0 otherwise. 

The syntax of the logic will also be augmented, not only with probabilistic 
reasoning, but also in order to allow us to manipulate complex amplitudes and 
to talk about qubit independence. Hence, besides for the set X of real variables, 
we also assume fixed a set Z of complex variables. 


Definition 8. The quantum institution I? = (Sig, Sen’, Mod’, |-%) based on I 

(and Qb) is defined as follows: 

— Sen’(2’) is the least set including Sen(*’) such that: (Bô), (6; I ĉ2) € 
Sen! (X) if 6,61,62 E€ Sen’(’); [F] € Sen?(2’) if F C Qb(X); and (tı < 
t2) E€ Sen!(X) if t1,t2 € TR(X), where the sets T$(2’) and T&(X) (of 
real valued and complex valued terms, Ror are defined by mu- 
tual induction as follows: X,R C T}(Z); (fy) € T(E) if y € Sen(Z), 
(tı + t2), (tite) € TE(2) if ti, te € ee , and Re(w), Im(w), arg(u), |u| € 

Tp(X) if u € TE(L); Z C TAD), |T) € TE(2) if A C F C Qb(5), 
(ty + itz), (t1.e%%) € TA(X) if ti, t2 € TR a), WE T&(X) ifu € TAX), 
(ui + ua), (u1-u2) E€ TE(2) if u1, u2 E€ TE(L), and (y > u1;u2) € TE(2) if 
p € Sen(X) and ui, u2 € TA(5), 

— Sen‘’(c) = øf is defined inductively by: o4(y) = Sen(c)(y), o7(H6) = 
(B a1(8)), 07(61 62) = (01 (81) 04 (52)), o1 ([F]) = [Sen(o)[F]}, and o4(t, < 
t2) = (Tho) (t1) < TR(o)(t2)), where TR (o) = Ri ar T4 (o) = 0% i de- 
fined by mutual induction: o$ (x) = z, of(r) =r, TT (fSen(o)(p)), 

oR(t + ta) = (oRlt1) + oR(te)), oRlti-t2) = of (ty) op(t2)), oR(Re(u)) = 

Relo% (u)), of(im(u)) = Im(o2(u)), o%(arg(u)) = arg(o4.(u)), of (lul) = 



























































ae 


fa RS: 
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lo(u), olz) = z, F(T) ra) = T)sen(a)lF F] Sen(o)|A p elti + itz) = 

(oR (tr) + iof(t2)), ob (tre) = (oRlt1) 2R), 06 (T) = of (u), 06 (ur + 
w) = (os (an) + o¢(u2)), og (u1-u2) = (06 (u1). o4 (u2), cooly > u1; u2) = 

(Sen(a (p) > eG (ur); 0G (ua); 

— Mod?( X) is the class of all tuples (M, S, |w), v, p) where: 0 # M C Mod() 
such that Vy is injective on M, S is a finite partition of Qb(X), |w} is a struc- 
tured quantum state over Vs[M] and S, v = {vra}acrcep(s) is a family 
of complex numbers such that, whenever F € US, vra = (v4 Waicr Vi) 





if vf € Vr, and vp, = 0 if vf ¢ Vr, and p is an assignment such that 

p(x) € R for every x € X, and p(z) € C for every z € Z; 
- Mod®(a)((M', 5", lw’), v’, p’) = (Mod(o)[M"], 01S], o° (lw), v, p’) with 

o° = Sen(c)*, o™t =Sen(c)~! and vr, = VSen(o)[F]Sen(o)[A]} 
— IF% is defined inductively by W I-$ y iff M IF y y, for p € Sen(Z), W IF} 
iff F € US, and W IF% {a < t2) iff [t1] ¥ < [to], where the denotations 
of real terms [_]¥ : T(S ) > R and of complex terms [_]@ : TA(X) > 
C are defined by mutual induction as follows: [xz] ¥ = p(x), for x € X, 
r]¥ =r, forr €R, [vlr = = i w) (Vn {méM:mlry o}), [tı + t2] ¥ = 
JY + [eal [eta] = lal ta), Reto = Re({u), [atol = 
mfu] t), fare(w |W = are(fullZ), and [lull = ful, EY = ple), 
for z € Z, [|T) pale = vra, [ti + ité = nly + ilt% [ti eita] W = 
tlg etle, pg = E, pa + lg = halë + feel, uult = 

[u I if M IFs 

[u2] ¥ otherwise 























<= 














ue [ue], and [p> u1; ua] = 








I? is an institution. Indeed the functoriality of Sen? is straightforward. Con- 
cerning Mod’, and given a signature morphism o : X > X’, note that indeed 
W = (M,|w),v,p) = Mod’(c)(W’) € Mod?(2) if W = (M’,|w"),v’, p) € 
Mod‘(”"). In particular, since M = Mod(c)[M"], then Sen(o)* [Vs [M"]] = 
Vs[M] just because Vs (Mod(c)(m’))(y) = Vs-(m’)(Sen(c)(y)) for every m’ € 
Mod(”) and y € Qb(), due to the satisfaction condition of the original 
institution. Moreover, if A C F C Qb(X), and we let F’ = Sen(c)[F] and 
A’ = Sen(c)[A], the definition of vra = V4, is suitable. First note that 
vh, € Vsy[M"] iff vf € Vs:[M] just because Sen(c)*(v%,) = vĘ. Moreover, 
FeUS iff F eUS. 


The satisfaction condition of I4 can be established by a simple induction on 
formulas and on terms. The only interesting cases concern independence formulas 
[F], plus probability (fy) and amplitude |T),,, terms. In the first case, we 
need to show that W IF% [F] iff W’ IF%, [F’]. The result follows immediately 
because F € US iff Sen(o)[F] € US’. In the second case, we need to show that 
[y] = LfSen(c)(y)]". Indeed, using the bijection between M and Vs,[M], 
the fact that M = Mod(c)[M"], the satisfaction condition of the institution J, 
and as a result the fact that Sen(c)*(Vs-(m’)) = Vs(Mod(o)(m’)), we have 
that 
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Lfel® = Mw) (Vs Hm E€ M:mlry y}]) = 

X meM:im so |(Vs(m )|w) |? = = ae ee ||w) (Vs(m))|? = 
X memm so Senla)’ (|w’))(Va(m))/? = 
memm gy Sen(o)(y) Senla)? (lw) )(Sen(o)? (Vz (m’)))|? = 
Sme Mimik n Sen(o)( (o) “ley (Vy (m)? = 

T |(Vsr(m’)|m’) |? = 

Hwn (Ver [{m! € M' : m I-s Sen(o)(y)}]) = [fSen(o)(y)]¥ - 





In the third case, we need to show that [|T)-4] = [|T) pa]. Since we 
already know that F € US iff F’ € US’, and that v, € Vs: [M'] iff of € Vo[M], 
it suffices to verify that, when it makes sense, 

(v4l(@a.cr |wi)) = Hocr (wio: lwi) = Ioicr lwi) (v4 Je JP 

To,cr |Sen(o)* (\w}))(Sen(o)* (wha)? = Hojer lwi (oh a)? = 

ocr (wE ay wi) = WE (acer |w;)). 


Most of the syntactic constructions introduced in I? are self explanatory. The 
quantum specific constructs, besides all the operations on complex numbers, are 
the [F] formulas and the |T),,4 terms. Intuitively, [F] holds if the qubits in F 
form an independent subsystem of the whole, whereas |T} p4 evaluates, when- 
ever it is meaningful, to the complex amplitude of the vector |v/{) in the current 
state of the systems. The logic resulting from the quantization of classical propo- 
sitional logic was introduced and studied in [I] [2]. A sound and weak complete 
calculus for the logic was obtained in [3] using an iterated Henkin construction 
inspired by the technique in [13]. The qubits of interest in this case were the 
propositional symbols. Using the logic it is possible, for instance, to model and 
reason about quantum states corresponding to the famous case of Schrédinger’s 
cat. The relevant attributes of the cat are cat-in-box, cat-alive, cat-moving 
being inside or outside the box, alive or dead, and moving, respectively. The 
following formulas constrain the state of the cat at different levels of detail: 











[cat-in-box, cat-alive, cat-moving]; 
at-moving => cat-alive); 

(0 cat-alive) N ($ (> cat-alive))); 

[cat-alive]); 

Jcat-alive = 3). 

















1. 
2. (c 
3. ( 
4. ( 
5. ( 


Observe that the assertions are jointly consistent. They characterize the 
quantum states where: the qubits cat-in-box, cat-alive, cat-moving are not 
entangled with other qubits; the cat is moving only if it is alive; it is possible that 
the cat is alive and also that the cat is dead; the qubit cat-alive is entangled 
with the others; and the probability of observing the cat alive (after collapsing 


the wave function) is Z, Our aim is now to relate I1 with J, I9, PP. 


Proposition 4. The triple C?? = (P4, a4, 3P4), where DP4 the identity functor 
on Sig; for each X, a4! translates each 6 € Sen” (X) to 6; and for each X, BY! 
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translates each (M, S, |w), v, p) € Mod? (X) to (M, (2™, 1), p|x) with u(B) = 
Hw) (V5[B]), is a comorphism CP? : P > I9. 


Proof. The naturality of the transformation a”? is straightforward. Concerning 
GP4 just note that given W = (M,S,|w),v,p) E€ Mod!(X), 6°4(W) is well 
defined. The probability space (2™, u} over M is just an isomorphic copy of 
(2V2 M] iw) over Vs[|M]. It is clearly a probability space, and its naturality 
follows easily. The coherence condition is trivial. > 


Note however that, in general, C?% does not satisfy the surjectivity condition, 
and thus J? is not a conservative extension of J”. This happens for two essential 
reasons: first, the sets M of models that appear in quantum models must be in 
one-to-one correspondence with their induced classical valuations on the qubits; 
second, even for such an M, due to the independence partitions, not all proba- 
bility spaces over M can be obtained from a quantum structure. Of course, by 
composition, we also obtain a comorphism C94 = CP4 o C9P ; J9 — T4, anda 
power-model comorphism C4 = C?40 CP : I — I4. It is very easy to check that 
C4 meets the necessary surjectivity condition, and therefore J? is still a conser- 
vative extension of J. Given X, and a model m of J, we just need to consider any 
quantum structure of the form {m}, {Qb()},1|Vo(m)),v, p) with vra = 1 if 
F = Qb(2) and v§ = V,(m), or F = 0, and vra = 0 otherwise. On the other 
hand, it is easy to see that also C94 will be surjective, and hence J? a conservative 
extension of I9, whenever the first of the above mentioned restrictions is trivial. 
That is, requiring that M is in one-to-one correspondence with its induced set 
of valuations should not exclude any possible set of models. For this condition 
to hold, it suffices to require that the qubit functor Qb is chosen in such a way 
that, for each X and m1, m2 E Mod(), if mı and mg coincide on the satisfac- 
tion of all qubits then mı = mg. If the qubits are representative typically one 
ends up with logically equivalent models, but in many institutions it is possible 
to avoid having logically equivalent models. The case of classical propositional 
logic is paradigmatic, once we take as qubits all the propositional symbols. But 
similar choices are possible in many other logics. In it is shown how to do 
this choice in any suitable finitely-valued logic. For instance, in Lukasiewicz’s 
three-valued logic it suffices to consider as qubits all propositional symbols and 
negations of propositional symbols. This possibility also helps in shedding light 
on the usefulness of considering restricted sets of admissible valuations. 


6 Conclusion 


Figure [I] is the diagram of the institutions and (power-model) comorphisms 
we have built, where — is used to distinguish the arrows that guarantee a 
conservative extension from their source to target. Our main goal in bringing into 
the realm of institutions the exogenous approach to globalization, probabilization 
and quantization of logics was to assess how general these constructions were. 
The first two constructions are fully general, in the sense that nothing is assumed 
about the given institution and also that nothing else is needed. But quantization 
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[9 
cI 
CIP 
I =a I? c94 
| 
ca 
[4 


Fig. 1. Institutions and (power-model) comorphisms. 


requires some additional information (the choice of qubit formulae). On the 
other hand, the quantum logic, as pointed out by the institutional approach, 
is not general enough (namely, injectivity of Vs on models, and surjectivity of 
the qubit translations). The solution seems to suggest a slight generalization 
of the exogenous approach towards working with multisets of models (as in 
Kripke structures), a promising line of further development of the approach. 
Furthermore, many interesting institution-theoretic questions remain open about 
these logics and the construction mechanisms discussed herein, like analyzing 
the properties of the constructions as functors on the category of institutions (or 
better, on some category of institutions), studying the underlying categories of 
models, and study their impact on the properties of the resulting categories of 
specifications. From a logic-theoretic point of view, the next step is to attempt 
at extending the completeness results in [5] [B] for a general base institution. 
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Abstract. This paper is dedicated to Joseph Goguen, my beloved teacher 
and friend, on the ocassion of his 65th anniversary. It is a survey of 
institution-independent model theory as it stands today, the true form 
of abstract model theory which is based on the concept of institution. 
Institution theory was co-fathered by Joseph Goguen and Rod Burstall 
in late 1970’s. In the final part we discuss some philosophical roots of 
institution-independent methodologies. 


1 Introduction 


The theory of institutions is a categorical abstract model theory which formalises 
the intuitive notion of logical system, including syntax, semantics, and the sat- 
isfaction between them. Institutions constitute a model-oriented meta-theory on 
logics similarly to how the theory of rings and modules constitute a meta-theory 
for classical linear algebra. Another analogy can be made with universal algebra 
versus groups, rings, modules, etc. By abstracting away from the realities of the 
actual conventional logics, it can be noticed that institution theory comes in fact 
closer to the realities of non-conventional logics. 

The notion of institution arose within computing science in 1980’s in response 
to the population explosion of logics in use there|!| with the ambition of doing 
as much as possible at a level of abstraction independent of commitment to any 
particular logic. This mathematical paradigm is called ‘institution-independent’ 
(abbreviated i-i) computing science or model theory. 

Since their definition by Goguen and Burstall [11J3i], institutions become a 
common tool in the study of algebraic specification theory and can be considered 
as its most fundamental mathematical structure. It is already an algebraic spec- 
ification tradition to have an institution underlying each language or system, in 
which all language/system constructs and features can be rigorously explained 
as mathematical entities. Most modern algebraic specification languages follow 
this tradition, including CASL [2], Maude [45], or CafeOBJ [25]. 


1 Some of them, such as first order (in many variants), second order, higher order, in- 
finitary, Horn, equational, partial, type theoretic, intuitionistic, modal (in many vari- 
ants), are well known or at least familiar to the ordinary logicians, while others such 
as linear, behavioural, process, rewriting, polymorphic, coalgebraic, object-oriented, 
etc. are known and used mostly in computing science. 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 65-498] 2006. 
© Springer-Verlag Berlin Heidelberg 2006 
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An institution I = (Sig!, Sen", Mod", H!) consists of 


1. a category Sig', whose objects are called signatures, 

2. a functor Sen': Sig! — Set, giving for each signature a set whose elements 
are called sentences over that signature, 

3. a functor Mod!: (Sig — Cat giving for each signature X a category 
whose objects are called X-models, and whose arrows are called X-(model) 
morphisms, and 

4. a relation 5 C |Mod"(X)| x Sen!(X) for each X € |Sig"|, called X-satis- 
faction, 








such that for each morphism y: X — X” in Sig!, the satisfaction condition 


M' yy Sen!(y)(p) iff Mod'(y)(M") FS p 








holds for each M’ € |Mod!(’)| and p € Sen'(). When there is no danger of 
ambiguity, we may skip the superscripts from the notation of the entities of the 
institution, for example Sig! may be simply denoted as Sig. 

We denote the reduct functor Mod'(y) by -lẹ and the sentence translation 
Sen'(y) by y(_). When M = M'Ìẹ we say that M is a y-reduct of M’, and that 
M' is a y-expansion of M. 

Li model theory applies to a wide variety of logics, however due to space 
constraints, in this paper we will discuss only examples from classical first order 
logic and some of its fragments and extensions. 


Classical (first order) logic as institution. Let FOL be the institution of many 
sorted first order logic with equality. Its signatures (S, F, P) consist of a set of 
sort symbols S, a set F of function symbols, and a set P of relation symbols. 
Each function or relation symbol comes with a string of argument sorts, called 
arity, and for functions symbols, a result sort. Fw—s denotes the set of function 
symbols with arity w and sort s, and P, the set of relation symbols with arity w. 

Signature morphisms map the three components in a compatible way. Models 
M are first order structures interpreting each sort symbol s as a set Ms, each 
function symbol o as a function Mo from the product of the interpretations of the 
argument sorts to the interpretation of the result sort, and each relation symbol 
m as a subset Mr of the product of the interpretations of the argument sorts. 
Sentences are the usual first order sentences built from equational and relational 
atoms by iterative application of logical connectives and quantifiers. Sentence 
translations rename the sorts, function, and relation symbols. For each signature 
morphism g, the reduct M’}, of a model M” is defined by (M'l,)2 = Ms for 
each x sort, function, or relation symbol from the domain signature of y. The 
satisfaction of sentences by models is the usual Tarskian satisfaction defined 
inductively on the structure of the sentences. 

Without loss of generality, for the sake of simplicity of presentation, we always 
assume non-empty sorts for the models. This can be achieved in two ways. The 
semantic solution is to consider only models for which M, 4 @ for each sort 
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s. The syntactical solution is to consider only signatures having at least one 
constant for each sort. 

The institution PL of propositional logic is obtained as the sub-institution 
of FOL obtained by considering only the empty sorted signatures. 

Positive first order logic, FOL", restricts the FOL sentences only to those 
constructed by means of A, V, V, J, but not negation. Here V and J are no longer 
reducible to A and V and vice versa. 

An universal Horn sentence in FOL for a first order signature (S, F, P) isa 
sentence of the form (VX)H = C, where H is a finite conjunction of (relational 
or equational) atoms and C is a (relational or equational) atom, and H > C is 
the implication of C by H. The sub-institution HCL, Horn clause logic, of FOL 
has the same signatures and models as FOL but only universal Horn sentences 
as sentences. 

An algebraic signature (S, F) is just a FOL signature without relation sym- 
bols. The sub-institution of HCL which restricts the signatures only to the 
algebraic ones and the sentences to universally quantified equations is called 
equational logic and is denoted by EQL. 

EQLN is the minimal extension of EQL with negation, allowing sentences 
obtained from atoms and negations of atoms through only one round of quan- 
tification, either universal or existential. More precisely, all sentences have the 
form (QX)timtz where Q € {V, 3} and 7 € {=, F}. 

Let VV be the sub-institution of FOL determined by the universal disjunction 
of atoms. 

Infinitary first order logic, FOL...,, extends FOL by allowing infinite con- 
junctions. Similarly, HCL. extends HCL by allowing the hypotheses H of Horn 
sentences (VX)H => C to be infinite conjunctions of atoms. Also, VV. extends 
VV by allowing infinite disjunctions of atoms. 

Other examples of institutions in use in computing science include partial 
, rewriting [44], label algebra [6], higher-order [8], polymorphic , temporal 
[30], process [30], behavioural [7], coalgebraic [13], object-oriented [32] logics, and 


Many Many more... 











Significance of Institution-Independent Model Theory 


While the goal of i-i formal specification has been greatly accomplished in the 
algebraic specification literature, recently there has been significant progress to- 
wards model theory too. This responds to the feeling shared by some researchers 
that deep concepts and results in model theory can be reached in a significant 
way via institution theory. The significance of i-i model theory is manifold. 
First, it fulfils the main abstract model theory ideal by providing an uniform 
generic approach to the model theory of various logics. This is especially relevant 
for areas of knowledge involving a big variety of formal logical systems, most of 
them unconventional. An important example comes from computing science in 
general, and algebraic specification in particular. Related to this, institutions 
also provide an ideal platform for exporting the rich and powerful body of con- 
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cepts and methods developed by conventional model theory to a multitude of 
unconventional logics. 


While conventional ‘abstract’ model theory of Barwise and Feferman 
extends first order logic explicitly by abstracting only sentences and satisfac- 
tion and leaving signatures and models concrete and conventional, institutions 
axiomatise the relationship between models and sentences by leaving them ab- 
stract. Because of this lack of commitment to any particular logic, institutions 
can be therefore considered as the true form of abstract model theory, some 
authors even calling this ‘abstract abstract model theory’... 


Then, i-i model theory has a special methodological significance. The i-i top- 
down way of obtaining a model theoretic result, or just viewing a concept, leads 
to a deeper understanding which is not suffocated by the (often irrelevant) details 
of the actual logic and guided by structurally clean causality. A model theoretic 
phenomenon is thus decomposed into various layers of abstract conditions, the 
concepts being defined and results obtained at the most appropriate level of 
abstraction. This contrasts with the traditional bottom-up approach in which 
the development is done at a given level of abstraction. Thus concepts come 
naturally as presumptive features that a “logic” might exhibit or not. Hypotheses 
are kept as general as possible and introduced on a by-need basis. Results and 
proofs are modular and easy to track down, despite sometimes very deep content. 
Another reason for the strength of i-i methodology is that institutions provide 
the most complete framework for abstract model theory, emphasising the multi- 
signature aspect of logics by considering signature morphisms and model reducts 
as primary concepts. 


Finally, institution theory provide an efficient framework for doing logic and 
model theory ’by translation or borrowing’ via a general theory of mappings 
(homomorphisms) between institutions. For example, a certain property P which 
holds in an institution I’ can be also established in another institution I provided 
that we can define a mapping I — I’ which ‘respects’ P. 


Apart of re-structuring known model theoretic methods, i-i model theory 
has already produced two classes of new concrete results. The first class is rep- 
resented by model theories for a multitude of less conventional logics which did 
not have one properly. Out of i-i model theory, even a relatively well studied 
area like partial algebra gets with minimal effort (in fact almost for free!) a well 
developed and coherent body of advanced model theoretic concepts and results. 
A second class of concrete applications is constituted by new results in classical 
model theory obtained by institutional methods. At the moment of writing this 
survey, we can report interpolation and definability for numerous Birkhoft-style 
axiomatizable fragments of classical logic [2249] and the elegant solution to the 
interpolation conjecture for many sorted logic [35|. The former results reveal 
a strong causality relationship between axiomatizability, on the one hand, and 
interpolation and definability, on the other hand. They also demount, or revise, 
the causal relationship between interpolation and definability. Maybe in this sec- 
ond class of applications we can also mention the considerably facilitated access 
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to highly non-trivial results in classical model theory, such as Keisler-Shelah 
Isomorphism Theorem. 

This paper is a brief journey through i-i model theory as it stands today. A 
full textbook on this topic is under preparation [I6]. 


2 Basic Concepts 


We assume the reader is familiar with basic notions and standard notations 
from category theory; e.g., see [38] for an introduction to this subject. By way 
of notation, |C| denotes the class of objects of a category C, C(A, B) the set 
of arrows with domain A and codomain B, and composition is denoted by “;” 
and in diagrammatic order. The category of sets (as objects) and functions (as 
arrows) is denoted by Set, and Cat is the category of all categories} The opposite 
of a category C (obtained by reversing the arrows of C) is denoted C®P. 

In the following we focus on some basic institution theory concepts. 


2.1 Presentations and Theories 


The satisfaction relation between models and sentences determines a Galois con- 
nection between the classes of models and the sets of sentences of a signature. 
Let X be a signature in an institution (Sig, Sen, Mod, =). Then 





- for each set of X-sentences F, let E* = {M € |Mod(X)| | M Hs e for each 
e € E}, and 
- for each class M of X-models, let M* = {e € Sen(X) | M Es e for each 


M eM}. 





These two functions denoted “(_)*” form what is known as a Galois connec- 
tion. Closed classes of models M = M** are called elementary and closed sets of 
sentences E = E** are called theories. 

When E and F’ are sets of sentences, E’* C E* is denoted by E = E’. Two 
sentences e and e’ of the same signature are semantically equivalent (denoted 
as e H| e’) if they are satisfied by the same class of models, i.e., {e} H {e’} 
and {e’} H {e}. Two models M and M’ of the same signature are elementarily 
equivalent (denoted as M = M’) if they satisfy the same set of sentences, i.e. 
{M}* = {M’}*. An institution is closed under isomorphisms when all isomorphic 
models are elementarily equivalent. In this paper, we will always assume that 
our institutions are closed under isomorphisms. 

A theory E is presented by a set of sentences Eo if Eo C E and Eo = E, and 
is finitely presented if there exists a finite Ey which presents E. A presentation 
morphism @: (X, E) > (X', E’) is a signature morphism such that (E) C E’**. 
A presentation morphism between theories is called a theory morphism. Notice 
therefore that a theory morphism ¢: (X, E) > (X', E’) is a signature morphism 

















? Strictly speaking, this is only a quasi-category living in a higher set-theoretic uni- 
verse. 
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such that ¢(£) C F”. It is easy to notice that under the composition of signa- 
ture morphisms, presentations, respectively theory morphisms, form categories 
denoted Pres, respectively Th. 


Theorem 1. [3I} The forgetful functors Pres — Sig and Th — Sig create lim- 
its and colimits. Consequently, in any institution, the category of its presenta- 
tions/theories has whatever limits or colimits its category of signatures has. 


For example, FOL has all small (co)limits of signature}, hence it also has all 
(co)limits of presentations /theories. 


2.2 Model Amalgamation 


The model amalgamation property discussed in the following is one of the very 
fundamental semantical properties of logics which underlies almost all i-i model 
theoretic developments. It is the merit of institution theory to have discovered it. 

In FOL, consider a model Mı for a signature X1 and a model Mə for a 
signature X such that Mı and Mə are ‘consistent’ on the intersection of the 
their signatures, i.e. Mil s,qy, = Mels,ny,. The two models Mı and Mə can 
be ‘amalgamated’ to a model Mı ® Mə for the union of the two signatures by 
(Mı & M2). = (M1) « when x € a or (Mı & M2) = (Mə)x when z € dg. 
Notice that this definition is correct because Mı and Mə are ‘consistent’ on 
3M X2, and that the amalgamation is the unique (X1 U X2)-model such that 
(Mı & Mə) DA = Mı and (Mı & Mə) ls = Mo. 

Such model amalgamation property can be defined in any institution by 
abstracting the intersection-union square of signatures to any commuting square 
of signatures. In any institution, a commuting square of signature morphisms 


yi 


L —— X' 
02 


Fig. 1 


is an amalgamation square if and only if for each X1-model Mı and a X2-model 
Mə such that Mı fy, = Mef.,, there exists an unique X'-model Mı ®y,,y, M2, 
called the amalgamation of Mı and Mo, such that (M1 yı ,p2 M2) l0, = Mi and 
(Mı ®g1,92 M2) loa = M2. When we relax the requirement on the uniqueness 
of Mı y1,p2 M2, we say that this is a weak amalgamation square. This amal- 
gamation property is different and much more basic than some of the model 
amalgamation properties studied in classical model theory textbooks referring 
to the existence of a common elementary extension of two models of the same 
signature. 


3 One way to establish this is via general Grothendieck category constructions from 


; 
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From a categorical viewpoint, when we also involve the model homomor- 
phisms, the model amalgamation property says that 


Moat S28" Mod( 3%) 


wosiea| [reseo 


Mod( X2) Mod(02) Mod(”) 


is a pullback in Cat. 

At the level of arbitrary institutions model amalgamation can therefore be 
regarded as a limit preservation property. An institution (Sig, Sen, Mod, =) is 
semi-/directed/inductive/weakly exact when the model functor Mod: Sig — 
Cat preserves pullbacks /directed /inductive /weak4 limits, and is simply exact 
when it preserves all small limits. 

In general the many sorted institutions are exact, while the unsorted (or one- 
sorted) ones are only semi-exact. This is due to the fact that the initial signatures 
in the unsorted logics still have a sort, they are thus not initial as many sorted 
signatures. On the other hand the semi-exactness is not affected since pushouts 
of unsorted signatures are the same as pushouts of many sorted signatures. 





Theorem 2. [26] If the institution I is semi-exact, then the theory model functor 
Mod? : (TAI)? — Cat preserves pullbacks|?| 


This result can be of course immediately extended to other types of exactness, 
including full exactness. 


2.3 Elementary Diagrams 


The method of diagrams constitutes a traditional tool in many of the conven- 
tional first order model theory developments. Recall that the ‘positive diagram’ 
of any first order model M consists of all atoms satisfied by M in the signature 
extended with the elements of M. At the level of i-i model theory this is reflected 
as a categorical property which, in essence, formalises the idea that the class of 
model homomorphisms from a model M can be represented (by a natural iso- 
morphism) as a class of models of a theory in a signature extending the original 
signature with syntactic entities determined by M. This can be seen as a coher- 
ence property between the semantical structure and the syntactical structure of 
an institution. By following the basic idea that a structure is in reality defined 
by its homomorphisms, the semantical structure of an actual institution is given 
by the model homomorphisms. On the other hand the syntactical structure of 
an institution is essentially determined by the atomic sentences. 


t Recall that a weak universal property, such as adjunction, limits, etc., is the same 
as the ordinary universal property except that only the existence part is required, 
uniqueness not being thus required. 

5 Mod? (X, E) is the full subcategory of Mod(X) of those models satisfying F. 
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An institution (Sig, Sen, Mod, =) has elementary diagrams [20] if and only if 
for each signature X and each X-model M, there exists a signature morphism 
ty(M): X —> Xm, “functorial” in X and M, and a set Em of X m-sentences such 
that Mod(Xm, Em) and the comma category M/Mod(X) are naturally isomor- 
phic, i.e. the following diagram commutes by the isomorphism is m “natural” 
in X and M 





Mod( Sy, Em) 2> (M/Mod(3)) 


forgetful 
Mod(ts(M) 


Mod( X) 


The signature morphism is(M): X > Xm is called the elementary extension 
of X via M and the set Em of Xm-sentences is called the elementary diagram 
of the model M. For each model homomorphism h: M — N let Np denote 
i5 m(h) 

The “functoriality” of ı means that for each signature morphism y: X > X” 
and each X-model homomorphism h: M — M'ly, there exists a presentation 
morphism ty(h): (Xm, Em) > (Xw, Em’) such that 


ty (M 
gos, 


e| |w 


E yy 
to ( My W 


commutes. 
The “naturality” of i means that for each signature morphism y: X —> 

X’ and each X-model homomorphism h: M — M'ĵẹ the following diagram 

commutes: 

15,M 


Mod(Xm, Em) 





M/Mod(5) 
voston | |emesozson, 
Mod( Xp, Et) —> M'/Mod( 5") 
ts M' 


Note that each elementary diagram (Xm, Em) has an initial model Mm = 
iz'm(lm ). 

An institution with elementary diagrams 1 may be denoted by (Sig, Sen, 
Mod, Ft). 

For example, classical model theory considers traditionally various kinds of 
model homomorphisms; each of them determine different elementary diagrams. 
Below we give a list of several possibilities, each of them corresponding to a 
specific restriction on model homomorphisms, but with the same elementary ex- 
tensions. Let M be a X-model for a FOL-signature. Let ¿s (MM) be the extension 
X — Xm adding the elements of M as new constants to X, and Mm be the 
ts(M)-expansion of M such that Mm = m for each element m € M. 
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model homomorphisms] Em 





alllatoms in Mj, 
injectiveJatoms and negations of atomic equations in Mj, 
closed|atoms and negations of atomic relations in Mj, 
closed and injective|atoms and negations of atoms in Mj, 
elementary embedding|Mj, 


Recall that a FOL-model homomorphism h: M — N is closed when Mr = 
h~*(N,,) for each relation symbol 7 of the signature, and is an ‘elementary 
embedding’ when Mm = Np. (Notice that because My = m 4 m for all 
m,m’ E€ M which are different, h is also injective.) 

Elementary diagrams are used in many i-i model theoretic developments. 
For example, in the presence of elementary diagrams, limits and colimits of 
models can be obtained from corresponding limits and colimits of signatures. 
This is an important consequence of elementary diagrams because in the actual 
institutions, limits, and especially colimits of models are much more difficult to 
establish than (co)limits of signatures. 





Theorem 3. [20] Consider and institution with elementary diagrams and initial 
models of presentations. Then, for each signature X, the category of 5/-models 
has J-(co)limits whenever the category of signatures Sig has J-(co)limits. 


From Theorem[3] we can immediately establish that, for any FOL signature its 
category of models has all small limits and colimits. For this, we have actually 
to apply Theorem[3] to the fragment of FOL whose sentences are just (ground) 
atoms. Because any Horn presentation has initial models, we can extend this 
argument to a stronger result: the models of any Horn theory have all small 
limits and colimits. 


2.4 Free Models 


The problem of existence of free models in institutions is often represented by 
the problem of existence of initial models for theories. For example, in FOL the 
largest class of theories admitting initial models is that of theories of universal 
Horn sentences. 

At the level of an arbitrary institution (Sig, Sen, Mod, =), a theory morphism 
p: (X, E) > (2", E’) is liberal if and only if the reduct functor 
Mod?” (y): Mod?(2”, E’) — Mod” (X, E) has a left-adjoint (_)”. 





Theorem 4. [20/ A semi-exact institution with elementary diagrams and 
pushouts of signatures is liberal when each theory has an initial model. Con- 
versely, if the institution has initial signatures and is exact, each theory has an 
initial model whenever the institution is liberal. 


When we apply Theorem [4]to FOL, we get that HCL is liberal. 
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3 Internal Logic 


Much of the i-i development of model theory relies on the possibility of defining 
concepts such as logical connectives, quantification, and atomic sentences inter- 
nally to any institution. The main implication of this fact is that the abstract 
satisfaction relation between models and sentences can be decomposed at the 
level of arbitrary institutions into several concrete layers of satisfaction defined 
categorically in terms of (a simple form of) injectivity and reduction. Essentially 
speaking, this ‘internal logic’ is what gives depth to the i-i approach to model 
theory. 


3.1 Boolean Connectives and Quantifiers 
Boolean connectives. Given a signature X in an institution 


- the X-sentence p’ is a (semantic) negation of p when p’* = |Mod()|\p*, 
and 

- the X-sentence p’ is the (semantic) conjunction [53] of the X’-sentences pı 
and p2 when p'* = pł N pi. 


We can easily notice that negations and conjunctions of sentences are unique 
modulo semantical equivalence. 

An institution has (semantic) negation when each sentence of the institution 
has a negation, and has (semantic) conjunctions when each two sentences (of the 
same signature) have a conjunction. Distinguished negations are often denoted 
by =, while distinguished conjunctions by -^ -. 

All these can be extended in the same way to other Boolean connectives, such 
as disjunction (V), implication (=), equivalence (<), etc., and also infinitary 
conjunctions and disjunctions. An institution which has all semantic Boolean 
connectives is called a Boolean complete institution. 

Notice that while FOL is Boolean complete, EQL and HCL have no seman- 
tic Boolean connectives 


Quantifiers. Given a FOL signature morphism (S, F, P) and a set X of (new) 
variables (for S), any (S,F W X,P)-sentence can be regarded as an ‘open’ 
(S, F, P)-sentence with ‘unbound’ variables X. When there are no unbound vari- 
ables, an open sentence is just an ordinary (‘closed’) sentence. Recall that for any 
(S, F, P)-model M, M — (AX)p if and only if there exists M’ an (S, F w X, P)- 
expansion of M such that M’ = p. 

The concept of quantification can be defined ‘internally’ to any institution 
I by abstracting FOL signature inclusions (S, F, P) — (S,F w X,P) to any 
signature morphism x: X — X” in I. Therefore at the abstract level of arbitrary 
institutions 

- a X-variable is just a signature morphism xy: X > X’, 

- an (open) x-sentence is just a X'-sentence, 

- a S-sentence p is a (semantic) existential y-quantification [53] of a x- 


sentence p’ when p* = (p’")[,; in this case we may write p as (Ax)p’, 
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- a S/-sentence p is a (semantic) universal x-quantification [53] of a x-sentence 
p’ when p H| 7(Sx)—p’; in this case we may write p as (Vy)p’. 

For a class D C Sig of signature morphisms, we say that the institution has 
universal/existential D-quantification when for each x: X — X' in D, each 
X'-sentence has a universal/existential y-quantification. 

Generally, one may consider quantification only up to what the respective 
concept of signature supports. For example FOL signatures support quantifi- 
cations only up to second order, for higher order quantifications one needs to 
involve a different concept of signature, coding higher order types. 

Based on this internal concept of variable in [2I] we have introduced an 
internal general concept of substitution, which captures first-order, second-order, 
and higher-order substitutions in actual logics. 








Finitary signature morphisms. In conventional (first order) model theory, the 
quantifications are finitary. At the level of abstract signature morphisms in 
institutions, we say that a signature morphism xy: X > X” is finitary when 
for each directed diagram of X-models (M;—>M;j )icjyecr,<) with a colimit 
(M: —>M )ier and each y-expansion M’ of M there exists an index i € I and 
a x-expansion ju), of pi. 


3.2 Representable Signature Morphisms 


Quasi-representable signature morphisms. For any FOL signature (S, F, P) and 
any set X of variables, given any (S, F, P)-model homomorphism h: M > N, 
any (S,F w X,P)-expansion M’ of M determines uniquely a (S,F w X, P)- 
expansion h’: M’ — N’ of h defined by h’ = h and Ni. = h(M!) for each 
x € X. In general, in FOL this property holds only for first order variables, 
and can be seen as an i-i generalisation of the concept of first order variable. 
This is important because many model theoretic results depend upon restricting 
quantification to first order. 

In any institution, a signature morphism y: X — X” is quasi-representable 
[16] when for each X’-model M’, the canonical functor below determined by the 
reduct functor Mod(x) is an isomorphism (of comma categories) 


M! /Mod(3") = (M'[y)/Mod(3) 


Usual ‘first order’ variables in actual standard institutions, but also in institu- 
tions such as E(FOL) (of FOL elementary embeddings) constitute examples of 
quasi-representable signature morphisms. However, this concept accommodates 
also other less conventional types of variables. For example, in the restriction of 
REL (relational logic restricting FOL signatures only to those without oper- 
ation symbols) to strong model homomorphisms, any signature extension with 
constants and/or relation symbols is quasi-representable. 


Proposition 1. [76{79/ 1. In any institution the (finitary) quasi-representable 
signature morphisms form a subcategory of Sig. 
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2. If the institution is semi-exact, then quasi-representable signature morphisms 
are stable under pushouts. 


Consider a quasi-representable signature morphism y: X > X” and assume that 
Mod(”) has an initial model 05. We have the following canonical isomorphisms: 


Mod( X") S 0s /Mod( X") S (Os fy) /Mod(2’) 


This situation shows that the X’-models M’ can be ‘represented’ isomorphically 
by X-model homomorphisms My — M’|,, where My denotes 0x fy. 

Therefore, a signature morphism x: X — X' is representable [I9] if and only 
if there exists a Y-model My (called the representation of x) and an isomorphism 
iy of categories such that the following diagram commutes: 


Mod(3’) —“S (M,/Mod()) 


forgetful 
Mod(x) 


Mod(’) 


Fact 1 A signature morphism x: X — X' is representable if and only if it is 
quasi-representable and Mod(X”) has an initial model. 


Therefore, in FOL representable and quasi-representable signature morphisms 
are the same concept. For example, given a set X of variables for a FOL signature 
(S, F, P), the representation of the signature inclusion (S, F, P) > (S, Fw X, P) 
is given by the (free) term F-algebra Tp (X). This corresponds to the fact that 
(S, F w X, P)-models M are in canonical bijection with valuations of variables 
from X to the carrier of M, which, by the freeness of Tp(X), are in canonical 
bijection with (S, F, P)-model homomorphisms Tp (X) > M. 


Proposition 2. [14] A FOL-signature morphism is representable if and only 
if it is bijective on sort symbols, relation symbols, and non-constant operation 
symbols. 


First-order substitutions can be recovered from the internal concept of sub- 
stitution between representable signature morphisms; at the general level they 
are called representable substitutions [BPI]. 


3.3 Basic Sentences 


Given any set of atoms (either equational or relational) E for a FOL-signature 
(S, F, P), let Og be the initial (S, F, P)-model satisfying E. Notice that 


M |= E if and only if there exists a model homomorphism 0g — M 





for each (S, F, P)-model M. 
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Given a signature X in an arbitrary institution, a set E of X-sentences is basic 
if there exists a X-model Mp such that for each X-model M, M Es E if 
and only if there exists a model homomorphism Mg —> M. 

Notice that not all sentences admitting an initial model are basic. A coun- 
terexample is given by negations of equations tı Æ t2 in an algebraic signature 
(S,F 4 On the other hand, not all basic sentences are atoms or conjunctions of 
atoms. For example, it can shown that FOL existentially quantified atoms are 
basic too. 

When the model homomorphisms Mz — M are also unique, then we say 
that E is epic basic. We say that a sentence p is (epic) basic when {p} is (epic) 
basic. Note that in FOL all atoms are epic basic. 

We say that a basic set of sentences F is finitary if the model Mp is finitely 
presented in the category Mod( 2’) of X-models. Note that in FOL any finite set 
of atoms is finitary basic. 





Proposition 3. [76/ In any institution with elementary diagrams with quasi- 
representable elementary extensions, the elementary diagrams are basic. 


In any institution, a universal Horn sentence is a sentence semantically 
equivalent to (Vx)E = F’ where x: X — &” is a quasi-representable signature 
morphism, F is an epic basic set of ’-sentences, and E’ is a basic set of X- 
sentences. A universal Horn sentence (Yy)E => E’ is finitary when E, E’ and 
x are finitary. Notice that universal Horn sentences in FOL, as defined in the 
previous chapter, are the FOL instances of the i-i finitary Horn sentences. 


3.4 Elementary Homomorphisms 


The classical model theoretic concept of elementary embedding can be abstracted 
to any institution (with elementary diagrams) as follows. 

First notice that in any institution with elementary diagrams, the elementary 
diagram of any model M has an initial model, denoted Mm. Then a model 
homomorphism h: M — N is elementary [34] when Np H= Mj;. 





Fact 2 For each elementary homomorphism h: M— N, M* C N*. 


Based on the internal concept of open sentence, one may define another 
concept of elementary homomorphism which does not require elementary dia- 
grams. Given a class D C Sig of signature morphisms, a X-model homomorphism 
h: A— B is D-elementary when A’ C B’ for each D-expansion h’: A’ — B’ 
of h. 

In the actual institutions, D is usually taken to be the class of all signature 
extensions with constants. Notice that in the case of FOL, and in fact in all 
institutions with finitary sentences, elementarity with respect to signature ex- 
tensions with arbitrary number of constants is equivalent to elementarity with 
respect to extensions adding finite numbers of constants. Notice that in these 
situations the following applies well. 


6 This has the term model Trp as its initial model, however it is not basic. 
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Proposition 4. [34] In an weakly semi-exact institution Let D be a class of 
quasi-representable signature morphisms which is stable under pushouts. Then 
D-elementary homomorphisms form a sub-institution of the original institution. 


In the case of FOL, when we take D the class of finite signature extensions with 
constants, this just says that FOL elementary embeddings form an institution. 

In the presence of elementary diagrams satisfying certain ‘normality’ condi- 
tions (see for the definition), which is very natural in actual institutions, 
the two notions of elementary homomorphisms coincide. This leads to another 
important fact: the elementary homomorphisms attached to a system of elemen- 
tary diagrams bring their own system of elementary diagrams, which is in fact 
“more elementary” than the starting one. 


Corollary 1. [34] In any weakly semi-exact institution I with “‘D-normal’ el- 
ementary diagrams for D a class of quasi-representable signature morphisms 
which is stable under pushouts, and such that it contains all elementary exten- 
sions, the elementary homomorphisms form a (sub-)institution E(I) which has 
elementary diagrams. E(1) is called the elementary sub-institution of I. 


Theorem [below is an i-i generalisation of famous Tarski’s Elementary Chain 
Theorem [57] which is used for many results in classical model theory (see [12]) 
and shows that the closure of elementary homomorphisms under directed colim- 
its holds when the institution either has all negations (such as FOL, EQLN), 
or no negation at all (such as FOLt, EQL), and it may fail on intermediate 
cases (such as HCL). 


Theorem 5. [34] Assume one of the following: 

- each sentence is accessible from the basic ones by (possibly infinite) con- 
junctions, disjunctions, universal D-quantifications, and finitary existential D- 
quantifications, or 

- the institution has negations and each sentence is accessible from the 
basic ones by (possibly infinite) conjunctions, negations, and finitary D- 
quantifications. 

Then the class of D-elementary homomorphisms (or just elementary homo- 
morphisms if in addition the institution has D-normal elementary diagrams and 
D-contains all elementary extensions) is closed under directed colimits. 


4 Model Ultraproducts 


Much of the conventional model theory can be developed through the powerful 
method of ultraproducts (see for example [86]). The i-i method of ultraprod- 
ucts employs the following well known categorical concept of filtered products 
43} 1/4041). 

Let C be a category with small products and small directed colimits. Consider 
a family of objects {A;}ie7. Each filter F over the set of indices J determines a 
functor Arp: F — C such that Ar(J C J’) = prs: [hey Ai > [ey As for 
each J, J’ € F with J C J’, and with py 7 being the canonical projection. 
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Then the filtered product of {Ai}icr modulo F is the colimit u: Ar > [|p Ai 
of the functor Ap. 





PJ',J 
Ier Ai ies Ai 


LZ 
Ir 4: 


If F is an ultrafilter then the filtered product modulo F is called an ultra- 
product. When A; = A for all i € J, then the filtered product is called filtered 
power. Notice that a (direct) product [],-; A; is the same as the filtered product 
Liny 4i 

Categorical filtered products permit the definition of filtered products of 
models in any institution with small products and small directed colimits for 
each of its categories of models. We say that an institution has (small) prod- 
ucts/directed colimits of models when all its categories of models have (small) 
products/directed colimits. 

In the case of FOL, model products are easy and directed colimits of mod- 
els are created by the forgetful functor to (the underlying) sets because of the 
finiteness of the arities of the operations and relations. Alternatively, one may 
use the FOL corollary of Theorem [B] Categorical filtered products in FOL are 
the same as classical filtered products first time introduced in [89]. 

For a signature X in an institution, a X-sentence e is 


icl 


— preserved by F-filtered factors if [| p Ai =s e implies {i € I | A; Hs e} € F, 
and 
— preserved by F-filtered products if {i € I | Ai Ex e} € F implies [], Ai Fs 


e, 














for each filter F € F over a set I and for each family {A;}iez of X-models. 
A sentence is a Log-sentence [I9] when is preserved by all ultrafactors and all 
ultraproducts. 


Theorem 6. [I9} In any institution, the ŁLos-sentences 

- contain all finitary basic sentences, 

- are closed under Boolean connectives, 

- are closed under any finitary representable quantification, and 

- are closed under any projectively representable quantification if the institu- 
tion has epic model projections. 


An institution is a Los-institution [I9] if and only if all its sentences are Łoś- 
sentences. For example, FOL is a Los-institution because each sentence is acces- 
sible from equations and relational atoms, which are finitary basic, by finitary 
representable quantifications and Boolean connectives. Instead of finitary rep- 
resentable quantification we may alternatively use the argument of projectively 
representable quantifications. This shows that the extension of FOL with infini- 
tary quantifications is also a Los-institution. However the extension FOL... of 
FOL to infinitary conjunctions is not a Los-institution. 
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Compactness. An institution is m-compact if each set of sentences is consistent 
when all its finite subsets have at least one model. If for each set of sentences Æ 
and each sentence e,  — e implies the existence of a finite subset Hy C E such 
that Ep = e, then we say that the institution is compact. 

In the light of Theorem [6] the following constitutes an i-i Compactness The- 
orem. 








Corollary 2. Any Los-institution is (m-)compact. 


5 Saturated Models 


Saturated models are used in many model theoretic developments (see [{12]), and 
they can be approached naturally in an i-i framework. 


Chains and (A, D)-saturated models. In any category C, for any ordinal A, a 


A-chain [27] is a \-diagram (Aa )i<j<, Such that for each limit ordinal 
€ <4, (fic icc is the colimit of (fig icj<e- 

For any class of arrows D C C, a (A, D)-chain [27] is any A-chain (fi,;)icj<r 
such that fi;i+1 E€ D for each i < A. 

For each signature morphism x: X > &”, a X-model M y-realizes a set 
E' of &”-sentences, if there exists a y-expansion M’ of M which satisfies F’. 
It x-realizes finitely F” if it realizes every finite part of EF’. A X-model M is 
(A, D)-saturated [27] for À a cardinal and D a class of signature morphisms when 


for each ordinal a < À and each (a,D)-chain (5;——+3) )icj<a with Xo = J, 


for each (es 3 ) € D, each ¥o0,o-expansion of M x-realizes any set of 
sentences if and only if it y-realizes it finitely. 

The traditional concept of A-saturated model can be recovered from this by 
considering D to be the class of FOL signature extensions with a finite number 
of constants. 


A-small signature morphisms. A signature morphism Y—+5" is X-small [27] 
for a cardinal A when for each chain ( i ee 4 3 o<i<j<a Of X-homomorphisms 


and each y-expansion M’ of My, there exists i < A and M; cA ag a g- 
expansion of f;,,. For example, finitary signature morphisms are w-small. 

The following shows that each model can be elementarily embedded into a 
saturated model, thus providing an existence theorem for saturated models. 


Theorem 7. EJ 1. M =N if there exists a model homomorphism M — N, 
2. it has finite conjunctions and existential D-quantifications, 
3. it has inductive colimits of signatures and is inductive-exact, 
4. for each signature X, the category of X-models has inductive colimits, 
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5. for each signature morphism +5" and F' set of X'-sentences, if A 
realizes Ei’ finitely then there exists a model homomorphism A — B such that B 
realizes Ei’, 


6. for each signature morphism S—+S" and each X-model M, the class 
of x-expansions of M form a set, and 

7. each signature morphism from D is quasi-representable, the category Sig 
of signatures is D-co-well-powered, and for each ordinal A there exists a cardinal 
a such that each (A, D)-chain is a-small. 

Then for any cardinal A and for each X-model M there exists a 
X-homomorphism M — N such that N is (A, D)-saturated. 


Applications of Theorem [7] considers elementary institutions. This means 
that in the case of FOL, the considered institution should be in fact the sub- 
institution of E(FOL) with (arbitrarily large) signature extensions with con- 
stants as signature morphisms (in order to fulfil the inductive-exactness condi- 
tion). Then it is rather easy to establish the other conditions underlying Theorem 
[7] The most delicate are 4., which invokes Tarski’s Elementary Chain Theorem 
(see Theorem), and 5., which follows from compactness. 

The uniqueness of saturated models is probably the crucial result which is 
used in the applications of saturated model theory. At the i-i level this requires 
to spell out the following rather natural property of elementary extensions. 


Simple elementary diagrams. The elementary diagrams v of an institution are 
simple [27| when for each signature X and all X-models A, B, for each ts(B)- 
expansion A’ of A, the following is a pushout square of signature morphisms. 


ty (B 
a 9 
| [san 
Xa (XB) 


— 
bus (B) (1A) 


It is easy to notice that in actual examples, those elementary diagrams such that 
their elementary extensions just add the elements of the model as new constants 
to its signature, like in FOL, are simple because the above diagram is in fact a 
diagram of the form 


yy ——___> Sw |B] 





Sw Al Sw |Blw|Al 


where |A| and |B| denote the sets of elements of (the carriers of) A and B. 
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Sizes of models. A D-size of a model M in an institution with elementary dia- 
grams / is a cardinal number A such that the elementary extension 15(M) = vo, 
for some (A, D)-chain (i,j )icj<a- 

For example, if we take D to be the class of FOL finite extensions of signa- 
tures with constants, the D-size of a FOL model M can be taken as the cardinal 
of its set |M| of elements, i.e. |M| = Uses Ms where S is the set of the sorts of X. 
It can be noticed that in this case \-saturated and D-size A means cardinality A. 


Theorem 8. [27] If the institution 

1. has pushouts and inductive colimits of signatures, 

2. is semi-exact and inductive-exact on models, 

3. has simple elementary diagrams t, 

4. has existential D-quantification for a (sub)category D of signature mor- 
phisms which is stable under pushouts, 

5. has negations and finite conjunctions, and 

6. the sentence functor preserves inductive colimits 
then any two elementary equivalent (A, D)-saturated X’-models of D-size A are 
isomorphic. 


In the case of FOL, the considered institution is just FOL (i.e. with the 
positive diagrams as (abstract) elementary diagrams). Like for Theorem [7] con- 
dition 6. holds by the finiteness of the sentences. Therefore we obtain that any 
two A-saturated FOL models of cardinality are isomorphic. 

The following application is an i-i generalisation of the rather famous Keisler- 
Shelah Theorem |12|. In the actual institutions the following conditions can be 
established rather easily. 


Corollary 3. Consider a Łoś institution with a class D of signature mor- 
phisms satisfying the hypothesis of Theorem[8] and which also satisfies the fol- 
lowing: 

- it has finite conjunctions and existential D-quantifications, 

- each signature morphism preserve products and directed colimits, 

- each signature morphism lifts completely ultraproducts. 
Let à be a an infinite cardinal, U a countably incomplete A-good ultrafilter over I. 

- the cardinality of Sen(X) is strictly smaller than A, 

- for each model M, if M has a D-size À, then each ultrapower J Jy M for an 
ultrafilter U over I of cardinality k, has D-size \*. 

Assuming the Generalised Continuum Hypothesis, any two elementarily 
equivalent models have isomorphic ultrapowers (for the same ultrafilter). 


Let us say that an institution has the Keisler-Shelah property if and only if it 
satisfies the conclusion of above Corollary B] 


6 Preservation and Axiomatizability 


6.1 Axiomatizability by Ultraproducts 
In the applications the hypotheses of the following are handled by Theorem [6] 
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Theorem 9. In any institution with sentences preserved by ultraproducts that 
has negation and conjunction, 

- a class of models is elementary if and only if it is closed under ultraproducts 
and elementary equivalence, 

- a class of models is finitely axiomatizable if and only if both it and its 
complement are elementary. 


6.2 Varieties and Quasi-varieties 


In classical logic it is know that in general the universal Horn sentences are 
essentially the most complex sentences admitting initial models in the sense that 
each such sentence is equivalent to a set of universal Horn sentences. It is easy to 
see also that Horn sentences are also preserved by (closed) sub-models are direct 
products. Below we show that this equivalence between the existence of initial 
models and the closure under direct products and submodels is independent of 
the actual institution. 


Inclusion systems. We may use the concept of inclusion system for rephrasing the 
category theoretic concepts of subobjects and quotients (that are traditionally 
defined in terms of monics and epics). 

(T, E) is a inclusion system pa for a category C if Z and £ are two sub- 
categories with |Z| = |E| = |C] such that 

- T is a partial order, and 

- every arrow f in C can be factored uniquely as f = ef; iş with ef € E and 
if ET. 
The arrows of Z are called abstract inclusions, and the arrows of E are called 
abstract surjections. The abstract surjections of some inclusion systems need not 
necessarily be surjective in the ordinary set-theoretic sense, take for example the 
inclusion system for Set where each function is an abstract surjection and the 
abstract inclusions are just the identities. An inclusion system (Z, €) is a epic 
when all abstract surjections are epics. 

In any category C with an inclusion system, 


— Aisa subobject of B if there exists an abstract inclusion A B, and 
— an object B is a quotient representation of A if there exists an abstract 


surjection A — B. A quotient of A is an isomorphism class of quotient 
representations. 


The inclusion system is well-powered, respectively co-well-powered, if the class 
of subobjects, respectively quotients, of each object is a set. 

The category of models for a FOL-signature (S, F, P) admits two meaningful 
epic inclusion systems which inherit the conventional inclusion system of the 
category of sets and functions. Recall that a model homomorphism h: M — N 
is closed when M, = h~*(N,) for each relation symbol m € P, and is strong 


7 In [15] the original definition of [26] is weakened to what they called ‘weak inclusion 
systems’, which are in fact our inclusion systems. 
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when h(M,) = N, for each relation symbol m € P. Also a submodel M of 
a model N is the same with a model homomorphism M — N which is a set 
inclusion for each sort s € S. 


inclusion system|abstract inclusion Jabstract surjection 





closed closed submodels |surjective homomorphisms 
strong (plain) submodels|strong surjective homomorphisms 


Varieties and quasi-varieties. When C has small products a class of objects of 
C closed under isomorphisms 
- is a quasi-variety if it is closed under small products and subobjects, and 
- is a variety if it is a quasi-variety closed under quotients. 
A object A of C is reachable if and only if it has no prope] subobjects. 
The following result links the possibility of free models for theories to the 
quasi-variety property of the corresponding class of models. They generalise 
classical results from universal algebra (see [33] and [42]). 


Theorem 10. [54]20) Consider a semi-exact institution with pushouts of sig- 
natures and with elementary diagrams such that for each signature it category of 
models has an initial model, small products, and a co-well-powered epic inclusion 
system. If the class of models of each presentation is a quasi-variety, then the 
institution is liberal. 


The following result extends the conclusion of Theorem [0] with its opposite 
implication, thus obtaining an ‘if and only if’ characterisation of quasi-varieties. 


Theorem 11. [54[20/ Consider an institution with elementary diagrams such 
that 

- the category Mod(2) of X-models has an initial object Os, small products, 
and a co-well-powered epic inclusion system for each signature X, 

- all model reduct functors preserve the abstract inclusions and the abstract 
surjections, and 

- the model reduct functors corresponding to the elementary extensions reflect 
identities. 
Then each presentation has a reachable initial model if and only if the class of 
models of each presentation is a quasi-variety. 


Under a set of appropriate conditions, the following Quasi-Variety Theorem 
holds in any institution. 


Theorem 12. [5516] A class of models is a quasi-variety if and only if it is 
the class of models of a set of universal Horn sentences. 


The Birkhoff Variety Theorem also holds an i-i framework (under a set of ap- 
propriate conditions) when we abstract traditional ‘equations’ with representable 
universal basic sentences (abbreviated RUB), which are universal quantifications 
of basic sets of sentences by representable signature morphisms. 


8 Subojects which are different of A. 
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Theorem 13. [I6] A class of models is a variety if and only if it is the class of 
models of a set of RUB sentences. 


6.3 General Birkhoff Axiomatizability 


In FOL, a finer tuned version of the Quasi-Variety Theorem [12] says that 


M= =53 (PM), for each class M of models, where M* is the set of all Horn 


sentences satisfied by all models of M, PM is the class of all products from 


M and $s M is the class of all closed sub-models of models from M. Simi- 


larly, if instead we consider RUB sentences, cf. Variety Theorem [I3] we have 
that M** = (5 (PM))), where Z M is the class of all ‘quotients’ of models 
from M. 

The i-i concept of Birkhoff-style axiomatizable closure can be captured more 
generally by the following concept. (Sig, Sen, Mod, =, F, B) is a Birkhoff institu- 
tion [22] if and only if 


— (Sig, Sen, Mod, —) is an institution such that the category of models Mod( X) 


has small products and small directed colimits for each signature X € |Sigl, 
— F is a class of filters with {{x}} € F, an 
— Bs C |Mod( X)| x |Mod(X)| is a reflexive binary relation for each signature 


X € |Sig| 
such that 
M** = B5 (FM) 


for each signature X and each class of X-models M C |Mod( X)|, and where FM 
is the class of all F-filtered products of models from M 

The following is a rather short list of Birkhoff institutions obtained as sub- 
institutions of (infinitary) FOL by varying the type of sentences and via various 
well-known axiomatizability results: 








institution 





IL & 
Q 


FOL 


all ultrafilters 
FOL ultraradicals all ultrafilters 





PL = all ultrafilters 
universal (quantifier-free) FOL sentences zs all ultrafilters 
universal FOL, sentences a {{{«}}} 
HCL. 2; {{I} | I set} 
HCL = all filters 
universal FOL atoms Tr, 5s {{I} | I set} 
EQL pal ay {{Z} | I set} 
VV (universal disjunctions of atoms) es 3 all ultrafilters 
YV (univ. infinitary disj. of atoms) g en {{{*}}} 

yJ (universal-existential sentences) sandwiches ({12])lall ultrafilters. 


° The class of all filtered products of models modulo F for all filters F € F. 
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where H, denote the class of surjective, H, the class of strong surjective, He 
the class of closed surjective, Sw the class of inclusive, and Se the class of closed 
inclusive model homomorphisms. 


7 Interpolation 


Generalised interpolation in institutions. Craig Interpolation, abbreviated CI, 
is classically stated as follows: if pı = p2 for two sentences, then there exists a 
sentence p, called the interpolant of pı and p2, that uses logical symbols that 
appear both in pı and p2 and such that pı = p E p2. 

An equivalent expression of the above property assumes p1 |= p2 in the union 
signature X1 U Xə, and asks for p to be in the intersection signature X1 N X2, 
where X; is the signature of p;. If we naturally generalise the inclusion square 




















Xi N Xə a 


{| 


dig ——— X1 U Xn 





to any commuting square of signature morphisms (1, Ye, 61,92) like in Fig.1 
and replace sentences p1, p2, and p with sets of sentences E1, E2, and E, we get 
the following form of CI: If 01 (E1) Hs 02(F2), then there exists an interpolant 
E C Sen(2’) such that Fy Ey, yi(£) and y2(F) Es, E2. A commuting square 
satisfying the above property is called a Craig Interpolation square. 

Notice that in a compact institution, if F, is finite, then the interpolant 
F can be chosen to be finite too. The immediate consequence of this fact is 
that in compact institutions having finite conjunctions, this CI formulation is 
equivalent to the more classical single sentences formulation considering single 
sentences rather than sets of sentences. In fact, it is the potential absence of 
conjunctions which motivates the generalisation from single sentences to sets of 
sentences. 

In actual in institutions, in general, CI squares can be found among pushout 
squares since these constitute the accurate generalisation of intersection-union 
squares of signatures. While in the unsorted restriction of FOL all pushout 
squares have CI, this is not the case for (many sorted) FOL. Also, in EQL and 
HCL, not all pushout squares have CI. This hints that in actual institutions we 
should expect CI to hold not for all pushout squares, but for a restricted class 
of pushout squares. It is often convenient to capture such classes of CI squares 
by restricting independently pı and 42 to belong to certain classes of signature 
morphisms. Therefore, for any classes of signature morphisms £L, R, we say that 
the institution has the Craig (£,R)-Interpolation [923] if each pushout square 
of signature morphism of the form 

| 


y 
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is a Craig Interpolation square. The list below anticipates some of the most 
representatives: 


institution Rireference 


aimoetad FOE[ a sa feor Bor 
FOL [all finjective on sorts|Cor- (via Thm.) oT 
FOL injective on sorts] alfor Do 


EQL, HCL aÑ Pe Cor. 4] 






Craig interpolation can be established in two major different ways, which 
have rather complementary application domains, via Birkhoff-style axiomatiz- 
ability properties of institutions, or via Robinson consistency. 





7.1 Interpolation Via Birkhoff-Style Axiomatizability 


For a functor C: IP — Cat, let R = {R; C |Ci|? hier be a |J|-indexed binary 
relation. We say (see [22]) that an arrow u: i > 7’ in I lifts R if and only if for 
each M’ € |Cy| and N € |C;], if (Cu(M’), N) € Ri then there exists N’ € |C; | 
such that Ca (N’') = N and (M’, N’) € Ry. 


Theorem 14. [22 In a Birkhoff institution (Sig, Sen, Mod, =, F, B), any weak 
amalgamation square (p1, P2, 91,92) like in Fig.1 such that 

- Mod(y1) preserves products and directed colimits (of models), and 

- p2 lifts B 
is a Craig Interpolation square. 





Regarding Theorem[I4] CI is expected for weak amalgamation squares, which 
are slightly more general than pushouts squares in semi-exact institutions. The 
preservation of products and directed colimits by model reducts is easy in actual 
institutions, in fact they are created. For example, the latter holds because of the 
finiteness of the arities of the operation and relation symbols of the signatures. 
On the other hand, the lifting condition is the only interesting one which sets 
limits to CI in applications of Theorem 


Corollary 4. [22] For universal FOL and FOL, sentences, HCL, HCLx, 
universal FOL atoms, EQL, VV, VV, each pushout of signature morphisms 
(p1, 92,61, 02) like in Fig.1 with p2 injective, is a CI square. 


Interpolation via Keisler-Shelah property. In situations when the meta-Birkhoff 
axiomatizability is rather weak (in the sense that B is rather weakly defined), the 
lifting condition (on y2) can be rather hard to establish. The cost is thus shifted 
from the axiomatizability property to the lifting condition. A typical example is 
given by FOL, regarded as a Birkhoff institution with B the elementary equiv- 
alence relation =, and F the class of all ultrafilters (cf. Theorem [9). However 
by invoking the rather powerful result that FOL is a Keisler-Shelah institution, 
[50] provides a characterisation of elementary equivalence = strong enough for 
supporting an easy applicability of Theorem [14] and which also leads to the 
following corollary: 
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Corollary 5. In FOL, any pushout square of signature morphisms 
(p1, 2,01, 02) like in Fig.1 such that p2 is injective on sorts is a Craig 
Interpolation square. 


7.2 Interpolation Via Consistency 


A set of sentences E for a signature X in an arbitrary institution is consis- 
tent if it has models, i.e. E* is not empty. Consistency and interpolation are 
related by the concept of ‘Robinson consistency’. A commuting square of sig- 
nature morphisms (1, p2, 91, 02) like in Fig.1 is a Robinson Consistency square 
(abbreviated RC square) if and only if every theories F; € Sen(X;), i € {1,2}, 
with ‘inter-consistent reducts’, i.e. y7 (E1) U yz‘ (E2) is consistent, have ‘inter- 
consistent ¥’-translations’, i.e. 01(E1) U 02(£2) is consistent. 

Robinson Consistency in FOL is classically defined only for intersection- 
union squares of signature morphisms, however, like for CI, this restriction is not 
necessary. Notice also that in some institutions, usually those supporting strong 
Birkhoft-style axiomatizability, such as equational logic EQL for example, RC 
is a trivial property because each set of sentences is consistent. 


Theorem 15. In any institution with negation and finite conjunctions and 
which is compact, each commuting square of signature morphisms is a Robinson 
Consistency square if and only if it is a Craig Interpolation square. 


A span of signature morphisms 5 yn, is said to lift isomor- 
phisms [B5] if for each X;-models M;, i € {1,2}, such that Mily, = Maly,, 
there exists X;-models N; such that M; S N; and Nı ly, = Neal yo: 

A commutative square of signature morphisms (41, 2,01, 02) like in Fig.1 





lifts isomorphisms if the span X: <2 5 5, lifts isomorphisms. 


Theorem 16. [35] Assume an institution such that 

1. all model homomorphisms preserve satisfaction, i.e. ifh: A— B and then 
A* C B*, 

2. it has pushouts of signatures and is weakly semi-exact on models, 

3. it has elementary diagrams, denoted 1, 

4. it has universal quantification over signature morphisms of the forms ty(h) 
and ts(M) for each X-model homomorphism h: M > N, 

5. it has w-colimits4| of models which are preserved by the model reduct func- 
tors. 

6. it has negation and finite conjunctions, and 

7. it is compact. 
Then any weak amalgamation square (p1, 2,91, 02) like in Fig.1 (and in par- 
ticular any pushout square) which lifts isomorphisms is a Robinson Consistency 
square (and by Theorem[14] a Craig interpolation square too). 


10 Here w is the totally ordered set of the natural numbers. 
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In the case of classical FOL interpolation, the institution considered by The- 
orem [I6]is E(FOL), the institution of the FOL elementary embeddings. Then, 
only condition 4. might need more justification, the rest being easy (for example 
5. is just Tarski’s Elementary Chain Theorem; see Theorem B). Therefore, in the 
case of 4., if the sets of the ‘empty’ sorts of signatures are finite, shows that 
quantification over is(h) and ts(M) reduces to ordinary FOL quantification. 
Also, it is easy to see that in FOL a span (y1, 2) lifts isomorphisms iff either 
one of yı or 2 is injective on sorts (see ). 


Corollary 6. The pushout of a span DAS 2s; of FOL signature 
morphisms such that either pı and p2 is injective on sorts is a Craig interpola- 
tion square in FOL and FOLK Ww. 


8 Definability 


The classical definability problem in model theory can be formulated as follows: 
for any FOL-signature (S, F, P), a new relation symbol ~ is ‘implicitly’ defined 
by a theory E if and only if it is ‘explicitly’ defined by the same theory. 7 
is implicitly defined when the forgetful reduct Mod¥°"((S, F, P w {7}), E) > 
Mod¥°"(S, F, P) is injective, which in this case can be formulated in a more 
syntactic but equivalent way as 


EU Elr/n'] F(s,r,pwtnn'}) VX) (X) © 7'(X)) 





for any other new relation symbol 7’ of the same arity and where Efr/r'] is the 
copy of E with z replaced by z’, while m is explicity defined if 7 can be ‘defined’ 
by a (S, F w X, P)-sentence Ey, i.e. 


E F(S,F,Pw{r}) (VX)(7(X) € Er) 





where X a string of variables matching the arity of 7. 


Generalised definability in arbitrary institutions. Definability problem can be 
naturally formulated at the level of abstraction of arbitrary institutions by ab- 
stracting the situation of the signature inclusion (S, F, P) —> (S, F, P & {7}) to 
an arbitrary signature morphism. 

Let y: X — 2” be a signature morphism and E’ be a X'-theory. Then y 


— is defined implicitly by E’ if the reduct functor Mod( X”, E’) —> Mod( X) is 
injective, and 

— is defined (finitely) explicitly by E’ if for each signature morphism 0: X > 
Xı, and each sentence p E€ Sen(’}), there exists a (finite) set of sentences 
E, C Sen( X1) such that 


E' |x (Y0) & p1(Ep)) 
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where 
X P Dy 


| p 


Xi > Xi 
pı 


is any pushout square of the span 5< y of signature mor- 
phisms. 





Note that FE, is a (finite) set of sentences rather than a single sentence as 
in the classical formulations of definability. The explicit definability says that 
the new part of X’ introduced by y can be coded only by ‘symbols’ of X. Al- 
though these formulations coincide when the institution has conjunctions, the 
set of sentences formulation gets the right concept of definability for institutions 
without conjunctions, such as EQL, HCL, etc. This situation is very similar to 
that of interpolation, where the concept of interpolant which is meaningful for 
institutions not necessarily having conjunctions is given by a set of sentences 
rather than by a single sentence. 

One may define the concept of explicit definability such that the quantifica- 
tion involved is admitted by the institution by requiring 0 to belong to a class 
D of signature morphisms stable under pushouts such that the institution has 
universal D-quantification. Because such condition would not affect the results 
below, for the simplicity of presentation we prefer the unrestricted version of the 
explicit definability with 0 any signature morphism. 


Implicit definability contains the explicit definability. One of the most important 
aspects of definability theory is to establish the relationship between implicit and 
explicit definability. Although in classical model theory and in most of the ac- 
tual institutions, explicit definability implies very easily implicit definability, the 
abstract model theoretic framework shows this is in fact a conditioned prop- 
erty holding for signature morphisms satisfying a certain condition which can be 
formulated by relying upon model amalgamation and elementary diagrams. 

In any semi-exact institution with elementary diagrams 1, a signature mor- 
phism y: X — SX” is tight when for all ©’-models M’ and N’ with a common y- 
reduct, M'® Mm = N’® Ny implies M’ = N’ (where M = M'|, = N’[, = N). 








peaa 


san] |e 


7 
EM are 


Consider the classical situation when y is a signature morphism in FOL adding 
one relation symbol m. Then the only possible difference between M’ and N’ 
could only be found in the difference between M} and N}. But M} = {X | 
M'E Mm FE a(X)}={X|N@Nn Er(X)} = NL. 
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The situation of this example is quite symptomatic for most of the actual 
institutions. M’ & Mm is just the expansion of M’ interpreting the elements 
of M by themselves. Therefore M’ @ My = N’ ® Nw implies that each atom 
in the extended signature is satisfied either by none or by both models, which 
means that each symbol newly added by y gets the same interpretation in M’ 
and N’. This argument holds in all actual institutions in which models interpret 
the symbols of the signatures as sets and functions. 


Corollary 7. A FOL signature morphism is tight if and only if it is sur- 


jective on sorts. 


Proposition 5. [49] In any semi-exact institution with elementary diagrams, 
each tight signature morphism is defined implicitly whenever it is defined explic- 
itly. 


For the rest of this section we focus on what is usually considered to be 
the ‘definability problem’ in model theory, i.e. the explicit contains the implicit 
definability. A signature morphism y has the (finite) definability property [49] iff 
a theory defines y (finitely) explicitly whenever it defines y implicitly. 


8.1 Definability Via Interpolation 


Craig-Robinson interpolation. Let us strengthen the Craig interpolation prop- 
erty by adding to the “primary” premises EF, a set I) (of Xə-sentences) as 
“secondary” premises. In any institution, a commuting square of signature mor- 
phisms (y1, 2, 01, 02) like in Fig.1 is a Craig-Robinson Interpolation square (ab- 
breviated CRI square) when for each set E, of 7-sentences, each sets Ez and 
I> of Xə-sentences, if 61(F) U @2(I2) Ex 02(E2), then there exists a set Æ of 
X-sentences such that Ey Fy, yi(E) and I> U yo(F) Fy, Eo. 

We can notice easily that any CRI square is also a CI square. The follow- 
ing gives a sufficient condition when CI and CRI are equivalent interpolation 
concepts. 











Proposition 6. [29/ If the institution has implications and is compact, a com- 
muting square of signature morphisms is Craig-Robinson Interpolation square if 
and only if is Craig Interpolation square. 


The following can be regarded as the i-i generalisation of the Beth Definability 
Theorem from classical model theory. 


Theorem 17. [49] In any semi-exact (compact) institution having Craig- 
Robinson (L, R)-interpolation for classes L and R of signature morphisms which 
are stable under pushouts, any signature morphism in LOR has the (finite) de- 
finability property. 


By the interpolation results for FOL presented above (see Corollary [6) and 
because tight signature morphisms in FOL are those which are surjective on the 
sorts (Corollary Z), we get the following: 
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Corollary 8. In FOL, any signature morphism which is injective on the sorts 
has the finite definability property. 

Moreover, the equivalence between implicit and explicit definability holds in 
FOL for the signature morphisms which are bijective on the sorts. 


8.2 Definability Via Axiomatizability 


Definability Theorem [I7]relies on Craig-Robinson interpolation, which does not 
hold for institutions having strong axiomatizability properties, such as HCL and 
EQL. In order to deal with such examples, develops another definability 
result which relies on axiomatizability properties and which can be applied to a 
series of actual situations when Craig-Robinson interpolation fails. 

The abstract Beth definability via axiomatizability relies on a ‘lifting’ con- 
dition of the signature morphism. Given a family of relations R = {Ry C 
|Mod(%’)| x |Mod(2’)|} sejgig) indexed by the category of the signatures of an 
institution, a signature morphism y: X — X’ lifts weakly R iff for each X”- 
model M’ and N’, if (M'l,, N'o) € Rs then there exists P’ a y-expansion of 
N'},, such that (M’, P’) € Rs. We may recall that the first (non-weakly) lifting 
concept has been used by the interpolation Theorem Notice that a signature 
morphism lifts weakly a family of relations R whenever it lifts R. 

However the result below uses the lifting condition in a reverse direction than 
of Theorem 


Theorem 18. Consider a (compact) semi-exact Birkhoff institution 
(Sig, Sen, Mod, =, F, B) and a class S C Sig of signature morphisms which is 
stable under pushouts and such that for each p € S 
- vy lifts weakly Bt}, and 
- Mod(y) preserves small products and directed colimits. 
Then any signature morphism in S has the (finite) definability property. 


The core technical condition which should be established in order to apply 
Theorem [I8] is, like for Theorem [14] the lifting condition on y. In the case of 
FOL, this leads to the following. 


Corollary 9. [Z9] Any FOL signature morphism which is surjective on the sort 
and operation symbols has the finite definability property in the institutions of 
the universal Horn sentences, and has the definability property in the institutions 
of universal sentences, of the universal infinitary sentences, and of the universal 
infinitary Horn sentences. 

Any FOL signature morphism which is bijective on the sort and operation 
symbols and injective on the relation symbols has the finite definability property 
in the institutions of the atomic sentences and of the equations, and has the 
definability property in the institutions of VV and VV æ. 
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9 Other Topics 


Due to space constraints, we cannot present here all important topics of today 
i-i model theory. Let us briefly mention here some of them which we could not 
develop here. 


Possible worlds semantics. This development [28] refers to the treatment of 
modalities and their applications independently of the underlying logic. More 
specifically, given a base institution with model amalgamation, on the semantics 
side we internalise the concept of frame, and on the syntactic side we extend 
the existing sentences with modalities. Our concept of frame is allowed to enjoy 
a flexible degree of sharing which is modelled by the means of an institution 
morphism from the base institution to a ‘domain’ institution. The extension of 
modal sentences is based on our internal logic approach to logical connectives 
and quantifiers. Then on top of the satisfaction relation of the base institution, 
we define a modal satisfaction relation between frames and modal sentences. 
This generates a new ‘modal’ institution on top of the base institution, and due 
to the very mild conditions on the base institution, this ‘modalisation’ procedure 
can applied to a wide variety of actual institutions. 

By employing the institution-independent method of ultraproducts [28] 
proves a fundamental preservation institution-independent result for modal sat- 
isfaction, that each modal sentence is preserved by ultraproducts of frames. 
Immediate consequences of this result includes compactness of possible worlds 
semantics. 


Grothendieck institutions. Grothendieck institutions [18] generalise the flatten- 
ing Grothendieck construction from (indexed) categories to (indexed) institu- 
tions. Regarded from a fibration theoretic angle, Grothendieck institutions are 
institutions for which their category of signatures is fibred. On the one hand, 
the actual institutions with many sorted signatures appear naturally as fibred 
institutions determined by the fibrations given by the functor mapping each sig- 
nature to its set of sort symbols. In this sense, fibred institutions can be regarded 
as the reflection of the many sortedness phenomenon at the level of institution 
theory. On the other hand, the Grothendieck construction on institution is more 
adequate for modelling heterogeneous multi-logic environments. Any system of 
institutions which are related by institution morphisms can be flattened by the 
Grothendieck construction to a homogeneous institution, as has been done in 
the case of CafeOBJ or heterogenous specification with CASL extensions 
[£7]. In other words, this can be interpreted as putting together a system of 
institutions into a single institution such that their individual identities and the 
relationships between them are fully retained. 

The Grothendieck construction on institutions can be done in two variants, 
using institution morphisms like in [18] or using institution comorphisms like in 
[46]. In the case when the institution morphisms or comorphisms correspond to 
adjunction situations between the categories of signatures of the institutions, the 
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morphism-based and comorphism-based Grothendieck institutions can be shown 
isomorphic [47]. 

An important class of problems posed by the Grothendieck, or fibred, institu- 
tions is that of lifting of model-theoretic properties from the ‘local’ level of index 
institutions, or fibres, to the ‘global’ level of the Grothendieck, or fibred, insti- 
tution. While [I7] and [18] investigate the lifting of theory colimits, free models, 
model amalgamation, inclusion systems, [23] solves the interpolation problem 
for Grothendieck institutions. 


Stratified institutions. They have been introduced by Marc Aiguier and Fabrice 
Barbier (see [3]) in order to model valuations of variables or states of models. 
Although it is possible to develop a great deal of model theory using this i-i 
technique, its biggest promise seems to be for the problem of combinations of 
logics, which is currently one of the most challenging problems. 


Proof-theoretic aspects. Recently there has been a successful attempt to enrich 
institutions with proof theoretic structure , not by amalgamation of the often 
conflicting model theoretic culture of institutions and the proof theoretic culture 
of type theory, but by an institutional proof theory from scratch by extending 
categorical logic [87] to represent proof as arrows in categories of sentences. 

The recent paper [24] introduces a concept of proof rules for institutions 
and argues that the proof systems of the actual institutions with proofs are 
freely generated by their presentations as systems of proof rules. It also shows 
that proof-theoretic quantification, an institutional refinement of the (meta-)rule 
of Generalisation from classical logic, can also be added freely to any proof 
system. By applying these universal properties, [24] is able to provide some 
general compactness results for proof systems and some general soundness results 
for institutions with proofs. 

Proof systems for institutional logic emerges as a very promising new area 
with many interesting open questions. 


10 Philosophical Roots 


In this final section I would like to share with the interested readers some personal 
reflections about some philosophical aspects of institutions from the perspective 
of Tibetan Buddhism, a spiritual and philosophical tradition shared by the fa- 
thers of institution theory, Joseph Goguen and Rod Burstall, and by the author 
of this survey. 

Institution theory is not only a mathematical theory. In fact, I think its 
main value resides in its unique way to approach mathematical and computing 
science phenomena. In my view, the institutional way can be seen as an effect of a 
Buddhist (trained) mind and an application of Sunyata, the Buddhist Mahayana 
perspective on reality. 

The highest explanation of Sunyata has been developed by the Madhya- 
maka Prasangika philosophical school which had started in the great Buddhist 
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monastic university of Nalanda about 2000 years ago. Maybe the most promi- 
nent philosophical figure of this school was Acharya Nagarjuna who wrote a 
series of treaties consisting mainly of very sophisticated philosophical and logi- 
cal arguments supporting the doctrine of Sunyata. The Madhyamaka Prasangika 
philosophical viewpoint has been inherited and preserved to our days by all tra- 
ditions of Tibetan Buddhism. 

In brief, Sunyata means the emptiness of all phenomena, either mind or 
matter, of an inherent nature. All phenomena thus arise on the basis of the 
so-called ‘co-dependent origination’, which at a certain level can be thought 
as a very profound distributed network of interdependencies. This view avoids 
both extremes of eternalism (things posses an inherent nature) and of nihilism 
(nothing exists), hence ‘Madhyamaka’ translated as ‘Middle Way’. 

When applied to modern science, this offers a non-essentialist perspective. 
While some branches of modern science, most notably quantum physics, res- 
onates strongly to the Madhyamaka Prasangika explanation of reality in a rather 
independent way (for the interested reader we recommend the recent survey [58]), 
i-i methodology has been directly influenced by this philosophical perspective. 

Sunyata also means a lack of reference point, a kind of groundless. Institu- 
tions realize this in a very transparent way, because they truly transcede the 
idea of commitment to particular logics. Moreover, concepts such as institution 
(co)morphisms, which are central to institution theory, constitute efficient tech- 
nical tools for understanding the immensely vast network of interdependencies 
between logical systems. By contrast, the original abstract model theory pro- 
gramme of Barwise and Feferman failed exactly because it was not based on 
such groundless view on logic, still having classical logic as a reference point. 

The rather intensive use of category theory for institutions, at various ways 
and at various levels, is another illustration of the groundless aspect of institu- 
tion theory. By emphasizing the relationships between objects rather than their 
internal structure, category theory might be the single mathematical area which 
realizes the principle of interdependency so close to its Buddhist meaning. 

This philosophical viewpoint underlying institution theory is very intimately 
connected to the feeling of elegance and clarity experienced when using the i-i 
methodology, either in computing science or in model theory. Due to the space 
limitations of this paper, we leave this discussion at this point, with the promise 
to come back sometime with a full essay on the connections between Buddhism 
and i-i thinking. 
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Abstract. The Semantic Web (SW) is viewed as the next generation of the Web 
that enables intelligent software agents to process and aggregate data autonomous- 
ly. Ontology languages provide basic vocabularies to semantically markup data 
on the SW. We have witnessed an increase of numbers of SW languages in the last 
years. These languages, such as RDF, RDF Schema (RDFS), the OWL suite of 
languages, the OWL” suite, SWRL, are based on different semantics, such as the 
RDFS-based, description logic-based, Datalog-based semantics. The relationship 
among the various semantics poses a challenge for the SW community for mak- 
ing the languages interoperable. Institutions provide a means of reasoning about 
software specifications regardless of the logical system. This makes it an ideal 
candidate to represent and reason about the various languages in the Semantic 
Web. In this paper, we construct institutions for the SW languages and use insti- 
tution morphisms to relate them. We show that RDF framework together with the 
RDF serializations of SW languages form an indexed institution. This allows the 
use of Grothendieck institutions to combine Web ontologies described in various 
languages. 


1 Introduction 


The family of Semantic Web (SW) languages increased very much in the last years 
and we guess it will continue to increase in the future. This is somehow surprising for 
SW community and it contradicts the initial intentions of the SW creators. But it is 
a reality and we have to live with it. This increase refers specially to the languages 
describing Web ontologies. Here are several examples: OWL with its three dialects 
(Lite, DL, and Full) [20], SWRL [15], SWRL FOL [21], DLP [12], OWL- with its 
three dialects (Lite~, DL~, Full) [5], WRL [I], and the list does not finish here. For 
these languages, different definitions for their semantics were proposed in the literature: 
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model-theoretic semantics [20]13], RDF based semantics [1]20], first-order logic based 
semantics [1521], frame logic semantics [5], Datalog semantics [5], Z semantics [8/18], 
and so on. This gives rise to some confusions and debates about the meaning of the 
hierarchy of SW languages as it has been illustrated in the well-known Tim Berners- 
Lee’s “Semantic Web Stack” diagram (Fig. [I). 


Rules | Trust 



















Digital Signature 





Unicode 


Fig. 1. The Semantic Web stack of languages 


In this paper we use the institution theory in order to investigate the exact relation- 
ships among these languages. 

The notion of institutions was introduced to formalize the concept of “logi- 
cal systems”. Institutions provide a means of reasoning about software specifications 
regardless of the logical system. Hence, it serves as a natural candidate to study the 
relationship among the various SW languages, as they are based on different logical 
systems (semantics). 

In this paper, we investigate the relationship among languages RDF [17], RDF 
Schema [4], OWL suite [20], and OWL- suite by defining their respective insti- 
tutions and relating these institutions using morphisms or comorphisms. A main ad- 
vantage is a better understanding of the semantical relationship among the various SW 
languages. Here we focus only on the RDF triple-based semantics. We show that RDF 
framework (RDF and RDF Schema) together with RDF serializations of SW languages 
form an indexed institution, and hence the whole framework can be organized as a 
Grothendieck institution [7]. An interesting fact is that the construction of the indexed 
institution is based on a diagram of RDF theories. We define a method of constructing 
institutions starting from theories and then we extend it to diagrams of categories and 
indexed institutions. We believe that we answer in this way the question regarding the 
layering of SW languages [22]. Semantically, the “stack” of SW languages depicted by 
Berners-Lee is an indexed institution. This indexed institution produces a Grothendieck 
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institution which offers a formal framework for combining ontologies written in various 
languages. In this way, all SW languages can “live” together. 

The rest of the paper is organized as follows. In Section 2, we briefly present the 
background information on SW languages and institutions. In Section 3, we define the 
institutions of (bare) RDF and RDF Schema languages. These institutions are used as 
the basis on which one of the semantics for SW languages is constructed using a method 
presented in Section 2. Section 4 is devoted to the construction of the institutions defin- 
ing SW languages. In Section 5, we construct an indexed institution based on a diagram 
of RDF theories and we show that the RDF-based semantics of SW languages can 
be defined as institution comorphisms from these languages to the indexed institution. 
Section 6 concludes the paper and discusses future work directions. 
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2 Preliminaries 


2.1 Semantic Web Languages 


The Semantic Web is a vision as the new generation of the current World Wide Web 
in which information is semantically marked-up so that intelligent software agents can 
autonomously understand, process and aggregate data. This ability is realized through 
the development of a “stack” of languages, as depicted by Berners-Lee in Fig. [I] 

Based on mature technologies such as XML, Unicode and URI (Uniform Resource 
Identifier), The Resource Description Framework (RDF) is the foundation of later 
languages in the SW. RDF is a model of metadata defining a mechanism for describing 
resources that makes no assumptions about a particular application domain. It provides 
a simple way to make statements about Web resources. An RDF document is a col- 
lection of triples: statements of the form (subject, predicate, object), where subject is 
the resource we are interested in, predicate specifies the property or characteristic of the 
subject and object states the value of the property. This is the basic structure of the sub- 
sequent ontology languages. RDF also defines vocabularies for constructing containers 
such bags, sequences and lists. 

RDF Schema [4] provides additional vocabularies for describing RDF documents. It 
defines semantical entities such as Resource, Class, Property, Literal and various prop- 
erties about these entities, such as subClassOf, domain, range, etc. In RDF Schema, Re- 
source is the universe of description. It can be further categorized as classes, properties, 
datatypes or literals. With these semantical constructs, RDF Schema can be regarded as 
the basic ontology language. 
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The Web resources are represented by full URIs, consisting of a prefix, representing 
a namespace, and a name representing the actual resource that is being described. In its 
full form, the prefix and the resource name are separated by a #. In shorthand form, the 
prefix can be represented by a shorter name and it is separated from the actual name by 
a colon (:), as the following example shows. After a resource has been introduced by an 
rdf : ID construct (in shorthand form of the URI), it can be subsequently accessed and 
augmented by the rdf: about constructs. When there is no possibility of confusion, 
the prefix can be omitted (but not the separator #). 











Example 1. The following RDF fragment defines an RDFS class Carnivore, which is a 
sub class of Animal. 


<rdfs:Class rdf: ID="Animal"/> 

<rdfs:Class rdf:ID="Carnivore"> 
<rdfs:subClassoOf rdf:resource=#Animal"/> 

</rdfs:Class> 


In this example, the namespace is the URI |http://ex.com/animals, The full URI for the 
class Animal is http://ex.com/animals#Anima 


The ability to organize and categorize domain knowledge is a necessity for software 
agents to process and aggregate Web resources. Domain knowledge is usually organized 
as inter-related conceptual entities in a hierarchy. The RDF language is not expressive 
enough to tackle such complexity. 

In 2003, W3C published a new ontology language, the Web Ontology Language 
(OWL) [20]. Based on description logics and RDF Schema, the OWL suite consists 
of three sublanguages: Lite, DL and Full, with increasing expressiveness. The three 
sublanguages are meant for user groups with different requirements of expressiveness 
and decidability. OWL Lite and DL are decidable whereas OWL Full is generally not. 

By saying that an ontology language is decidable, it actually means that the core 
reasoning problems, namely, concept subsumption, concept/ontology satisfiability and 
instantiation checking, are decidable [16]. 

One of the major extensions of OWL over RDF Schema is the ability to define re- 
strictions using existing classes and properties. By using restrictions, new classes can be 
built incrementally. In OWL, conceptual entities are organized as classes in hierarchies. 
Individuals are grouped under classes and are called instances of the classes. Classes, 
properties and individuals can be related by properties. 


Example 2. The following OWL fragment shows the definition of an object property 
eats and a class carnivore, which is further defined as an animal that only eats animals. 
This is achieved through the use of an al1ValuesFrom restriction in OWL. 


<owl:ObjectProperty rdf: ID="eats"/> 
<owl:Class rdf:about="#Carnivore"> 
<rdfs:subClassOf> 
<owl :Restriction><owl:allValuesFrom> 
<owl:Class rdf:resource="#Animal"/> 
</owl:allValuesFrom> 
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<owl:onProperty> 
<owl:ObjectProperty rdf:resource= 
"http://ex.com/animals#eats"/> 
</owl:onProperty> 
</owl:Restriction></rdfs:subClassOf> 
</owl:Class> 


The class Carnivore is defined to be a sub class of an OWL restriction that defines an 
anonymous class which only eats Animals. 


The OWL” suite of languages, namely Lite”, DL™ and Full7, is a restricted 
variant of OWL languages. OWL Lite™ and DL- are strict subsets of OWL Lite and DL 
respectively and they can be directly translated into Datalog. According to [5], the main 
advantages of the OWL” include the following. Firstly, by translating OWL™ to Dat- 
alog, highly efficient deductive database querying capabilities can be used; Secondly, 
rules extension and query languages can be easily implemented on top of OWL~. 

In order to expand the expressiveness of SW languages, several rules extensions 
have been proposed. SWRL is a direct extension of OWL DL that incorporates 
Horn-style rules. Among other things, it supports (universally quantified) variables and 
built-in predicates/ functions for various data types. 

On the contrary, the Web Rules Language (WRL) [1] is a rule-based ontology lan- 
guage. Based on deductive databases and logic programming, it is designed to be com- 
plementary to OWL, which is strong at checking subsumption relationships among con- 
cepts. WRL focuses on checking instance data and the specification of and reasoning 
about arbitrary rules. Moreover, WRL assumes a “Closed World Assumption”, whereas 
OWL and SWRL assume an ’Open World Assumption”. 


2.2 Institutions 


Institutions supply a uniform way for structuring the theories in various logical systems. 
Many logical systems have been proved to be institutions. Recent research showed that 
institutions are useful in designing tools supporting verification over multiple logics. 
The basic reference for institutions is [10]. A comprehensive overview on institutions 
and their applications can be found in [6]. A well structured approach of the various 
institution morphisms and many other recent constructions can be found in [II]. A 
recent application of institutions in formalizing the information integration is given in 
[9]. The Grothendieck institution construction we use in this paper follows the line from 
[7]. The institutions use intensively category theory; we recommend [2] for a detailed 
presentation of categories and their applications in computer science. 

In this section we recall the main definitions for institutions and we introduce two 
new constructions. The first is simple and it generalizes the notion of theoroidal comor- 
phism by allowing to encode sentences from the source institution by conjunctions of 
sentences in the target institution. The second is more complex and is used to construct 
indexed institutions starting from diagrams of semantically constrained theories from 
a basic institution. We use this construction to define the indexed institutions based on 
RDF triples corresponding to SW languages. 
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An institution is a quadruple S = (Sign, Mod, sen, =), where Sign is a category 
whose objects are called signatures, Mod : Sign®? — Cat is a functor which associates 
with each signature © a category whose objects are called X-models, sen is a functor 
sen : Sign — Set which associates with each signature È a set whose elements are 
called X-sentences, and |} is a function which associates with each signature È a binary 
relation Fy C |Mod(X)| x sen(X), called satisfaction relation, such that for each 
signature morphism ¢ : © — ©’ the satisfaction condition 


Mod($°?)(M’) Es y & M' Hy oly) 


holds for each model M’ € Mod(X’) and each sentence y € sen(X). The functor 
sen abstracts the way the sentences are constructed from signatures (vocabularies). The 
functor Mod is defined over the opposite category Sign®? because a “translation be- 
tween vocabularies” ¢ : £ — X’ defines a forgetful functor Mod(¢°?) : Mod(X’) — 
Mod(2) such that for each X’-model M’, Mod(¢°?)( M’) is M’ viewed as a X-model. 
The satisfaction condition may be read as “M” satisfies the ¢-translation of iff M’, 
viewed as a -model, satisfies y”, i.e., the meaning of y is not changed by the transla- 
tion ¢. 

We often use Sign(S), Mod(S), sen(S), Fg to denote the components of the insti- 
tution S. If ¢ : £ — LX’ is a signature morphism, then the X-model Mod(¢°?)( M”) is 
also denoted by M’|4 and we call it the ¢-reduct of M’. We also often write (p) for 
Mod(#)(). 

If F is a set of } -sentences and M a 2-model, then  — F denotes the fact that M 
satisfies all the sentences in F. Let F° denote the set F° = {y | (V M a model)M Hz 
F => M Hz y}. A sentence y is semantical consequence of F, we write F Ey y, iff 
pe F*. 

A specification (presentation) is a way to represent the properties of a system inde- 
pendent of model (= implementation). Formally, a specification is a pair (x, F), where 
È is a signature and F is a set of X-sentences. A (X,F)-model is a X-model M such 
that M z F. We sometimes write (©, F) = y for F Ex y. A specification morphism 
from (È, F) to (X’, F’) is a signature morphism ¢ : © — ©’ such that ¢(F) C F’*. We 
denote by Spec the category of the specifications. A theory is a specification (X, F) with 
F = F°; the full subcategory of theories in Spec is denoted by Th. The inclusion func- 
tor U : Th — Spec is an equivalence of categories, having a left-adjoint-left-inverse 
F : Spec — Th given by F(X, F) = (, F°) on objects and identity on morphisms. 

Given an institution S = (Sign, Mod, sen, =), the theoroidal institution S** of S 
is the institution Sth = (Th, Mod", sent", =t"), where Mod" is the extension of Mod 
to theories, sent” is sign;sen with sign : Th — Sign the functor which forgets the 
sentences of a theory, and =" = |sign|; H. 

Let S$ = (Sign, Mod, sen, =) and S’ = (Sign’, Mod’, sen’, EK’) be two institutions. 
An institution morphism (®, 3, a) : S — Y consists of: 

































































1. a functor ® : Sign — Sign’, 

2. a natural transformation 8 : Mod => °P; Mod’, i.e., a natural family of functors 
Gs : Mod() — Mod’ (#(Z)), and 

3. a natural transformation a : &;sen’ > sen, i.e., a natural family of functions ay : 
sen’(®(X)) > sen(Z), 
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such that the following satisfaction condition holds: 


M Hz ax(¢’) iff Be(M) Fay v 








for any X-model M in S and &(X)-sentence y’ in S’. Usually, the institution mor- 
phisms are used to express the embedding relationship. An example of institution mor- 
phism is (®, 3, a) : St" — S which express the embedding of S in St. ® : Th — Sign 
is given by (2, F) = È, 8 : Mod‘ = 6°? Mod is defined such that G(x,F) is the iden- 
tity, and a : P; sen = sen*” is defined such that azr) is the identity. 

An institution comorphism (®, 3, a) : S — 3" consists of: 


1. a functor & : Sign — Sign’, 

2. a natural transformation 3 : °P; Mod’ = Mod, i.e., a natural family of functors 
Bs : Mod’ (®()) — Mod(X), and 

3. a natural transformation a : sen = ®; sen’, i.e., a natural family of functions ay : 
sen(X)) — sen’(®(X)), 


such that the following satisfaction condition holds: 
Bs(M") Hz y iff M’ a(z) QY (p) 


for any ©(1)-model M’ in S and È -sentence y in S. If By is surjective for each signa- 
ture È, then we say that (®, 8, a) is conservative. Usually, the institution comorphisms 
are used to express the representation (encoding) relationship. A simple example of co- 
morphism is (®, 8, a) : S — St, where ® : Sign — Th is given by (L) = (X, 0), 
B : B; Mod = Mod is defined such that Js is the identity, and a : sen > @;sen" is 
defined such that ay is the identity. 

In many practical examples, we have to represent (encode) a sentence from the 
source institution with a conjunction of sentences from the target institution. A simple 
example is the representation of the equivalence y +> y’ by the conjunction of two Horn 
rules: y > Y’ A y’ —> g. Hence the following construction. The conjunction extension 
of S is the institution S^ = (Sign, Mod, sen’, =^), where sen^ (X) = sen(X)U{yi A 
--+ A pe | Pi,---, Pr E sen(X)}, M H$ y iff M Hz y for all p € sen(X), and 
MES p1 Ace A ve iff M Ex yi fori =1,...,k. There is an institution morphism 
(®, 3,a) : S^ — S expressing the embedding of S in S^. This embedding can also 
be represented by a comorphism from $ to S^. 

An indexed category is a functor G : [°? — Cat, where I is a category of indices. 
The Grothendieck category GĦ of an indexed category G : [°? — Cat has pairs (i, £), 
with ¿ an object in J and È an object in G(i), as objects, and (u, p) : (i, £) — (2, X’), 
with u : i — i’ an arrow in J and y: £ — G(u)(2’) an arrow in G (i), as arrows. 

The Grothendieck institution SË of an indexed institution S : 1°? — Ins has 























1. the Grothendieck category Sign” as its category of signatures, where Sign : 1°? > 
Cat is, the indexed category of signatures of S; 

2. Mod” : (Sign*)°? — Cat as its model functor, where Mod” (i, =) = Mod’ (X) 
and Mod” (u, p) = By,; Mod" (4); 

3. sen# : Sign” — Set as its sentence functor, where sen* (i, =) = sen (X); and 

4. M H% z piff M Fy ¢ forall i € |I], E € |Sign’|, M € |Mod* (i, X)|, and 








y € sen# (i, £); 
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where S(i) = (Sign’, Mod", sen’, =t) for each index i € |Z| and S(u) = (¢", BY, a") 
for each index morphism u € I. 
We show how a theory (Xo, Fo) and a model constraint can define an institution 





(Žo, Fo). A model constraint is a map [|_|], which associates a subcategory ||}, F], C 


Mod*" (£, F) with each theory (X, F), such that M’[ye [[X, F], for all ¢ : (£, F) > 
(X', F’) and M’ € |X, F]. Moreover, a model constraint implies in fact a seman- 
tical extension in the following sense. |Z, F], © Mod (£, F) implies [X, F]? 2 
Mod" (ZŁ, F)*, where M® denotes the set of sentences satisfied by all models in M. 
In other words, in the presence of model constraints we can prove more properties. The 


constraints defined in [10] are a particular case of model constraints when the subcate- 
gory can be syntactically represented. The institution (Xo, Fo) is defined as follows: 


1. the category of signatures is the comma category (Xo, Fo)| Th, where the objects 
are theory morphisms f : (Xo,Fo) — (2,F), and the arrows ¢ : f — f’ are 
consisting of theory morphisms ¢ : (£, F) — (2’, F’) such that f;¢ = f’, 

2. the model functor Mod(Zo, Fo) maps each signature f : (Xo, Fo) — (2, F) into 
the subcategory [[Z, F]].. 





3. the sentence functor sen(2o, Fo) maps a signature f : (Xo, Fo) — (È, F) into the 
set of 2-sentences, 
4. the satisfaction relation is defined by M =p yp iff M =x y. 








Note that the model constraint is required only for theories (X, F) for that there exists a 
theory morphism f : (Xo, Fo) — (©, F). We extend the above construction to diagrams 
of theories and indexed institutions. Let D : I — Th be a diagram of theories and 
([-]; | ¿ € |Z|) a model constraint such that if u : i — j is an arrow in J, then M'ta € 
[D(4)], for each M’ € [[D(7)]];. We denote D(i) by (Zi, Fi), i € |Z|. If u : j > i is 


an arrow in 7°”, then there is an institution morphism (®, 3, a) : (X;, F;) — (£4, Fi), 
where 


1. & maps a signature f : (£4, Fj) — (È, F) into the signature D(u);f : (Li, Fi) > 
(2, F); 
2. B : Mod(2,;,F;) — ®;(2;,F;) is as follows: if f : (2j,F;) > (2, E) isa 


signature in (55, F;), then (3; is the identity because Mf} Du) is a Xi-model by 





functoriality of Mod(S) and by the fact that D(u) and f are theory morphisms; 
3. a: &sen(X;,F;) — (Li, F;) is as follows: if f : (£4, Fj) — (È, E) is a signature 


in (È, F;), then agp) is identity. 


The diagram D : J — Th together with the model constraint ([|—]]; | ¿ € |Z|) produces 
an indexed institution D : [°? — Ins, where Ins is the category of institutions and the 
arrows are institution morphisms. We can define now the Grothendieck institution D#, 
where 


1. the category of signatures Sign(D*) is the Grothendieck construction Sign(D)* 
corresponding to Sign(D) : 1°? — Cat, which maps each index i into Sign(;, F;); 
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2. the model functor Mod(D*) : Sign 
Mod(D*)((i, f : (i, Fi) (2, 
and if (u,@) : (i, f(a Fi) > 
Mod(D¥)((u,@)) = Br (u); Mod 


D#) — Cat is given by: 

F))) is Mod(=i, Fi)(f) (that is equal to [[2, F];), 
>, Z, F)) > (j, f": (X;, Fi) > (X, F’)), then 
Ei, Fi)(4), (8,2,0) : Eh F) > (Ei, Fi); 





3. the sentence functor sen(D#) : Sign(D#) > Set is g given by: 
sen(D*)((i, f : a F;) — (È, F))) is sen(X;, F;)(f ) (that is A uae sen(S)(Z)), 
and if (u,@) : (i,f a > ZF) — Gf! : (oy, F i) - »F’)), the then 
sen(D#)((u, d)) = sen(Zi, F:)(¢); aş (u), where (®, 8, a) : (£j, F3) > (Zi, Fi); 








4. ri (Z, Fi) > (Z,F), M € Mod(D*)((i, f)) and y € sen(D*)((i, f)), then 
(if) y iff M Ff P. 








This construction will be used to formalize the RDF triple-based logics underlying SW 
languages. For instance, it is useful to combine ontologies described in various SW 
languages. 


3 RDF and RDF Schema Logics 


In this section, we define the institutions for the languages RDF and RDF Schema. 
The construction of these institutions is divided into three steps. Firstly, we construct a 
bare-bone institution for RDF logic, capturing only the very essential concepts in RDF, 
namely the resource references and the triples format. This logic then serves as the basis 
on which the institutions of the actual RDF and RDF Schema are constructed. In turn, 
the institutions defined in this section serve to define the RDF serialization of ontology 
languages defined in Section[4] 


3.1 Bare RDF Logic BRDF 


As introduced in Section[2.1] the Resource Description Framework (RDF) is the foun- 
dation language of the Semantic Web and all upper layer languages are based on it. 
Hence, they are all based on the syntax defined in RDF, which is, the triples format. 
Together with the use of URI for resource referencing, these two features of the RDF 
language are common to all other languages. Hence, we extract them from RDF lan- 
guage and define an institution, the bare RDF logic BRDF. 


Example 3. Since resource references are the only signatures in BRDF, any triple will 
be part of the sentences. As BRDF is a bare-bone RDF institution, it does not define 
the XML serialization presented in the previous two examples. Therefore, we will use 
the informal syntax in this example. Note that the separator # is replaced by a : in this 
notation. The following triple is a legal sentence in BRDF, stating that carnivores eat 
animals. 


(animals:Carnivore, animals:eats) animals:Animal) 
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The Bare RDF logic BRDF is a bare-bone institution with resource references as the 
only signatures. The sentences are triples. BRDF is not expressive at all. We use it as 
a basis upon which we develop the Grothendieck institution of the triples-based logics 
underlying SW languages. 

A signature RR in BRDF is a set of resource references. A signature morphism ¢ : 
RR — RR’ is an arrow in Set. The RR-sentences are triples of the form (sn, pn, on), 
where sn, pn, on E€ RR. Usually, sn is for subject name, pn is for property (predicate) 
name, and on is for object name. RR-models I are tuples I = (Ry, Pr, S1, extr), where 
Ry is a set of resources, Py is a subset of Ry (Py C Rj) - the set of properties, Sz : 
RR — Ry, is a mapping function that maps each resource reference to some resource, 
and ext; : Py — P(R; x Ry) is an extension function mapping each property to a 
set of pairs of resources that it relates. An RR-homomorphism h : I — I’ between 
two RR-models is a function h : Ry — Ry such that h(Pr) C Py, Sh = Sy, and 
extr; P(h x h) = h| p;exty. The satisfaction is defined as follows: 


I Err (sn, pn, on) iff (Si(sn), Sr(on)) € exty(Sy(pn)), 


that (sn, pn, on) is satisfied if and only if the pair consisting of the resources associated 
with the subject name sn and the object name on is in the extension of pn. 
In order to simplify the notation, we often write exty(pn) instead of exty(Sq(pn)). 





3.2 RDF Logic RDF 


The RDF logic RDF is constructed with BRDF as the basis. The addition in RDF is the 
built-in vocabularies of the RDF language and the semantics of these language con- 
structs. Hence, as shown below, we denote the signatures of RDF using theories, which 
consist of these built-in vocabularies and sentences giving them semantics. We also add 
some weak model constraints. More precisely, RDF is defined using the construction we 
defined in Section[2.2]starting from a theory RDF and a model constraint [_] ppp- 

The RDF theory is RDF = (RDFVoc, Tror), where the RDF vocabulary RDFVoc 
includes the following items: 


rdf:type, rdf:Property, rdf:value, 

rdf:Statement, rdf:subject, rdf:predicate, rdf:object, 
rdf:List, rdf:first, rdf:rest, rdf:nil, 

rdf:Seq, rdf:Bag, rdf:Alt, rdf:_1 rdf:_2 


and Tppr consists of triples expressing properties of the vocabulary symbols: 


(rdf:type, rdf:type, rdf:Property), 
(rd£:subject, rdf:type, rdf:Property), 
(rdf£:predicate, rdf:type, rdf:Property), 
(rdf:object, rdf:type, rdf:Property), 
(rdf:value, rdf:type, rdf:Property), 
(rdf:first, rdf:type, rdf:Property), 
(rdf:rest, rdf:type, rdf:Property), 
(rdf:nil, rdf:type, rdf:List), 

(rdf£:_1, rdf:type, rdf:Property), 
(rdf£:_2, rdf:type, rdf:Property), 
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Note that the above vocabularies such as rdf: type are all shorthands of legal 
URIs, as described in Section 2. All the above triples are self explanatory. For instance, 
the triple (rdf:value, rdf:type, rdf:Property) states that rdf: value 
is a property. 

We suppose that there is a given set Rapp of RDF resources and a function Sppr : 
RDFVoc — Rapp which associates a resource with each RDF symbol. It is easy to see 
that Repr and Sppr can be extended to an RDF-model RDF. 

For each theory such that there is a theory morphism f : RDF — (RR, T), we 
consider the model constraint | RR, T]],,, as consisting of those (RR, T)-models I 
such that 


RDF 


— Ry, includes Rgppr and the restriction of Sy to RDFVoc coincides with Sppr, 
— if p € Py, then (p, Sr(rdf:Property)) € exty(rdf:type). 


Since f is a theory morphism, the restriction of I to RDFVoc is an RDF-model. We denote 
by RDF the institution defined by the theory RDF together with the model constraint 
l-kor using the method presented in Section [2.2] 


ma 


If we denote by (Ø, Ø) the institution defined by the theory (Ø, Ø) and the model 


_—__ 


constraint [[RR, Tl], = Mod(BRDF)*(RR, T), then ERDE" is isomorphic to (Ø, Ø) 


Pi , — ath o 
and we have the institution morphisms RDF — BRDF — BRDF. 


3.3 The RDF Schema Logic RDFS 


RDF Schema defines additional language constructs for the RDF language. It expands 
the expressiveness of RDF by introducing the concept of universe of resources (rdfs : - 
Resource), the classification mechanism (rdfs:Class) and a set of properties that 
relate them (rdfs:subClassOf, rdfs:domain, rdfs: range). Hence, it is nat- 
ural for the RDFS institution RDFS to be developed on top of RDF, with some more 
model constraints added to capture the semantics of RDFS language constructs. 





Example 4. Example [I] defines the sub class relationship between two RDF Schema 
classes. In the shorthand, it can be represented in the informal syntax as follows. 


(animals:Carnivore, rdfs:subClassOf, animals:Animal) 


The RDF Schema theory RDFS = (RDFSVoc, Taprsyoc) is composed of the RDF 
Schema vocabulary RDFSVoc including RDFVoc together with 


rdfs:domain, rdfs:range, rdfs:Resource, 
rdfs:Literal, rdfs:Datatype, rdfs:Class, 
rdfs:subClassOf, rdfs:subPropertyOf, rdfs:member, 
rdfs:Container, rdfs:ContainerMembershipProperty 


and the sentences Trprs including Tpprs together the triples setting the properties of the 
new symbols (for the whole list of triples see [13}): 


(rdf:type, rdfs:domain, rdfs:Resource), 
(rdfs:domain, rdfs:domain, rdf:Property), 
(rdfs:range, rdfs:domain, rdf:Property), 
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(rdfs:subPropertyOf, rdfs:domain, rdf:Property), 
(rdfs:subClassoOf, rdfs:domain, rdfs:Class), 


(rdf:type, rdfs:range, rdfs:Class), 
(rdfs:domain, rdfs:range, rdfs:Class), 
(rdfs:range, rdfs:range, rdfs:Class), 
(rdfs:subPropertyOf, rdfs:range, rdf:Property), 
(rdfs:subClassOf, rdfs:range, rdfs:Class), 


We suppose that there is a given set Raprs of RDF Schema resources and a func- 
tion Spprs : RDFSVoc — Reprs which associates a resource with each RDF Schema 
symbol and that satisfies Spprs|apryoc = SRDF- 

For each theory such that there is a theory morphisms f : RDFS — (RR, T), we 
define the model constraint [[RR, T]],,., obtained by strengthening [RR, T]],,,,. with 
the following conditions. If I € [RR, T] then: 


RDFS? 


Ry includes Reprs and the restriction of Sy to RDFSVoc coincides with Spprs 
exty(rdfs:Resource) = Ry 
(Va,y,u,v: Ry)(z, y) € exty(rdfs:domain) A (u,v) € exth(x) > 

u € exty(y) 
(Vz, y, u,v : Ri) (z, y) € exty(rdfs:range) A (u,v) € exty(x) > 

v € exty(y) 
Va,y: Rr) (x,y) € extr(rdfs:subClass0f) > extr(x) C extr(y) 
Va : ext(rdf:Class))(z, Sı(rdfs:Resource)) € ext;(rdfs:subClassOf)) 
Va,y: Rr) (z, y) € extr(rdfs:subProperty0f) > extr(x) C extr(y) 
Va : exty(rdfs:ContainerMembershipProperty)) 

(x, Sy(rdfs:member)) € exti(rdfs:subPropertyDf) 


( 
( 
( 
( 


In other words, ||—]]kprs gives the intended semantics to syntactic constructions such as 
domain, range, subClassof, subPropertyOf, etc. 

We denote by RDFS RDFS the institution such that RDFS —> RDF is the indexed institution 
produced by the diagram RDF — RDFS together with the model constraint [_] ps. 
We have the theory morphisms (inclusions) (0,0) — RDF — RDFS and [_]y € 
l-leor © [EID eprs. We can formalize now the logics underlying RDF framework 
as the Grothendieck institution defined by the indexed institution: 


n — ~ th m 
RDFS ————> RDF ———+ BRDF [= BRDF 
co 


4 Semantic Web Logics 


A number of ontology languages have been proposed in the past years. These include 
the OWL suite of languages, the OWL™ suite of languages, OWL Flight, etc. They 
are all based on RDF and RDFS but imposes different restrictions on the usage of 
RDF(S) language constructs. Hence, their expressiveness is different. In this section, 


Semantic Web Languages — Towards an Institutional Perspective 111 


we construct institutions in an RDF(S)-independent way for some of these languages 
and inter-relate them using institution morphisms. Then we relate them to RDF(S) by 
exhibiting the comorphisms defining the RDF serializations. These institutions are in- 
crementally constructed using the same pattern. Therefore we present more details only 
for the first (smallest) one. 


4.1 OWL Lite- Logic owLLite_ 


OWL Lite~ is a proper subset of OWL Lite (see the next subsection) that can be 
translated in Datalog. It is obtained from OWL Lite by removing those features con- 
sidered hard to reason about. OWL Lite~ is the lightest dialect of SW languages and 
therefore we start with it. We denote by OWLLite the institution of the ontology lan- 
guage OWL Lite~. 


Example 5. The class subsumption relationship is allowed in OWL Lite~ as long as 
neither of the classes is either top (T, the super class of all classes) or bottom (L, the 
sub class of all classes, i.e., the empty class). Moreover, al1ValuesFrom restrictions 
like that mentioned in Example[Jis also allowed in OWL Lite~. Hence, Example[2]is 
also an OWL Lite~ fragment. 


The signatures of OWLLite are triples £ = (CN, PN, IN), where CN is a set of 
class names, PN is a set of individual property names, and JN is a set of individual 
names. We assume that CN, PN, and IN are pairwise disjoint. A signature morphism 
@: È — LY’ isa function ọ : CN U PN U IN — CN'U PN' U IN’ such that 
(CN) C CN’, d(PN) C PN’, d(IN) C IN’, where ©’ = (CN’, PN’, IN’). 

A X-model I consists of (Ry, St, extr), where Ry is a set of resources, Sy : CN U 
PN U IN — Ryis a map such that Sı( CN), Sy(PN) and S{(IN) are pairwise disjoint, 
and ext; is a map associating a subset of Ry with each class name cn € CN, anda subset 
of Ry x Ry with each property name pn. A ©-homomorphism h : I — I’ between two 
Ł -models is a function A : Ry — Ry such that Sr; h = Sy, exty| oyiP(h) = exty| oy, 
and exty| py; P(A x h) = extr| py- 

For class expressions and Ł -sentences we use a more compact notation: 


Res ::= restriction(pn allValuesFromcn) | 

restriction(pn minCardinality/(0)) 

C ::= cn | Res 

S ::= Class(cn partialC,...C,) | Class(cn complete cn... cnx) | 
EquivalentClasses(cn,... cnx) | 
ObjectProperty(pn super(pn)...super(pnx)) | 
pn.domain(cny...cn,) | pn.range(cny... cng) | pn.inverseDf (pny) | 
pn.Symmetric | pn.Transitive | 
SubProperty(pni pnz) | EquivalentProperties(pn,...pnx) | 
Individual(in type(cn1)...type(cnx)) | in.value(pn in) 
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The semantics of expressions is given by: 


exty(restriction(pn allValuesFromcn)) = 
{z | (Vy)(z, y) € exti(pn) > y € extz(en) 
exty(restriction(pn minCardinality(0))) = 


{x | #(Ly | (z, y) € exti(pn)}) > OF 


The satisfaction relation between OWL Lite™ X-models I and OWL Lite~ L-sen- 
tences is defined as it is intuitively suggested by syntax. For instance, we have: 
=y Class(cn partial C,...C,) iff exty(cn) C exty(cn) N--- N exty(cng) 
Ey ObjectProperty(pn super(pn))...super(pn,,)) iff 
extr(pn) C exty(pni) N- ++ N exty(pnx) 
=y pn.domain(cn... cng) iff domexty(pn) C exty(cen1) N --- Nexty(cng) 
=y SubProperty(pny pnz) iff exty(pni) C exty(pn2) 
=y Individual(in type(cn1) ... type(cnx)) iff Sr(in)Eextr(en1)N- - -Mexty(cn,) 
=y in.value(pn in.) iff (St(in), St(ini)) € exty(pn) 











4.2 OWL Lite Logic owLLite 


OWL Lite is the least expressive species of the OWL suite. It is obtained by imposing 
some constraints on OWL Full. These constraints include, for example, that the sets 
of classes, properties and individuals are mutually disjoint; that min and max cardinal- 
ity restrictions can only be applied on numbers 0 and 1; that value restrictions such 
as allValuesFrom and someValuesFrom can only be applied to named classes. 
Compared with OWL Lite”, OWL Lite is more expressive since it removes some con- 
straints that are imposed on the latter. The details are discussed in the following. We 


denote by OWLLite the institution of OWL Lite. 


Example 6. For example, OWL Lite~ does not support the relationship between OWL 
individuals, namely sameAs and differentFrom, whereas these features are pre- 
sent in OWL Lite. Suppose that we have two URI references for carnivores car1 and 
car2, which are actually referring to the same animal. We use the following code 
fragment to represent this piece of knowledge: 


<animals:Carnivore rdf: ID="carli"/> 
<animals:Carnivore rdf:ID="car2"/> 
<animals rdf:about="#carl1> 
<owl:sameAs rdf:resource="http://ex.com/animals#car2/> 
</animals> 


—_———— 


OWLLite is obtained from OWLLite by replacing the definition of expressions with 


Res ::= restriction(pnallValuesFromcn) | 
restriction(pn someValuesFromcn) | 
restriction(pn minCardinality(n)) | 

restriction(pn maxCardinality/(n)) 


C ::= cn | owl:Thing | owl:Nothing | Res 
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where n € {0, 1}, and adding the following sentences: 


pn.Functional | pn.InverseFunctional | 


SameIndividual(iny,...,inj,) | DifferentIndividuals(in,,..., inp) 
The semantics of the new expressions is as follows: 


exty(owl:Thing) is a subset of Ry s.t. (Y cn € CN )exty(cn) C exty(owl:Thing), 
ext;(owl:Nothing) = 0, 

exty(restriction(pn someValuesFromcn)) = 

{x | (3 y)(z, y) € exty(pn) A y € extz(en)}, 

exty(restriction(pn minCardinality(1)cn)) = 


{x | #{y | (z, y) € extr(pn)} > 1}, 
exty(restriction(pn maxCardinality(1)cn)) = 


{x | #Hy | (£, y) € extr(pn)} < 1}. 











The satisfaction of the new sentences is intuitive and straightforward and we omit its 
formal definition. 


ae eee 


Proposition 1. There is a conservative comorphism from OWLLite to OWLLite. 


4.3 OWL DL- Logic owLDesLog 


OWL DL- is an extension of OWL Lite~ and a subset of OWL DL (see the next 
subsection) which can also be translated in Datalog. We denote by OWLDeSLog the 
institution of OWL DL~. Compared to OWL Lite~, OWL DL™ allows additional lan- 
guage constructs value, someValuesFrom and oneOf, albeit that the latter two 
are only allowed as the first argument of subClassOf (left hand side). 


Example 7. In OWL DL, the value restriction not present in OWL Lite is allowed 
in OWL DL. This restriction constructs a class that for a given property, each of whose 
instances must have (among others) a particular individual as the value mapped by this 
property. Suppose that we want to model the fact that the ancestor of all humans is 
Adam (among all his/her other ancestors), assuming that we have defined an individual 
Adam and a property hasAncestor. Here is the definition in OWL DL~. 


<owl:Class rdf:ID="Human"> 
<rdfs:subClassOf><owl:Restriction> 
<owl:onProperty rdf:resource="#hasAncestor"/> 
<owl:hasValue rdf:resource="#Adam"/> 
</owl:Restriction></rdfs:subClassOf> 
</owl:Class> 
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ae 


OWLDeSLOg | is obtained from OWLLite by replacing the definition of expres- 
sions with 


C := cn | Res | intersectionDf£(Cy,...,Cn) 
Lhs_D ::= C | Lhs_Res | unionOf(Lhs_D,...,£hs_D) | oneOf(ini,..., ing) 
Rhs_D ::= C | Rhs_Res 
Res ::= restriction(pn value(in)) 
Lhs_Res ::= Res | restriction(pn someValuesFrom(Lhs_D)) 
| restriction(pn minCardinality(1)) 
Rhs_Res ::= Res | restriction(pn allValuesFrom(Rhs_D)) 


and replacing the class-related sentences with the following: 


Class (cn partial Rhs_D) | Class (cn completeC) | 
EquivalentClass(C;,...,C,) | subClassOf(Lhs_D, Rhs_D) 


Note the use of class expressions instead of named classes. £hs_D and Rhs_D repre- 
sent class descriptions that can only appear in the left hand side and right hand side of 
the subClassOf sentence, respectively. 

The semantics of the value restriction is exty(restriction(pn value in)) = {x | 
(x, St(in)) € extz(pn)}. The semantics of the other expressions and the satisfaction 
relation for the new sentences are defined as expected. 


eye . . P eT Sa = 
Proposition 2. There is a conservative comorphism from OWLLite to OWLDeSLOg . 


4.4 OWL DL Logic owLDesLog 


OWL DL is the main ontology language of the OWL suite. It is more expressive than 
OWL Lite yet still decidable. It relaxes some constraints imposed on OWL Lite and 
allows more language constructs to describe classes and properties. Still, classes, prop- 
erties and individuals are mutually disjoint in OWL DL. We denote by OWLDeSLog the 
institution of OWL DL. Compared to OWL DL- , OWL DL adds a number of language 
features, such as enumerated class, disjointness classes, functional property, etc. 


Example 8. The class Continents defines the continents of the Earth, namely Africa, 
Antarctica, Asia, Australia, Europe, North America and South America. As this class 
only contains these 7 instances, it is natural to use an enumeration to define it. 


<owl:Class rdf:ID="Continents"> 
<owl:oneOf rdf:parseType="Collection"> 
<owl:Thing rdf:about="#Africa"/> 
<owl:Thing rdf:about="#Antarctica"/> 
<owl:Thing rdf:about="#Asia"/> 
<owl:Thing rdf:about="#Australia"/> 
<owl: Thing rdf:about="#Europe"/> 
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<owl:Thing rdf:about="#North America"/> 
<owl: Thing rdf:about="#South America"/> 
</owl:oneOf> 
</owl:Class> 





OWLD@SLOgQ is obtained from OWLLIite replacing the definition of expressions with 


Res ::= restriction(pn value in) | 
( 
( 
( 


) 
restriction(pn minCardinality/(n)) 
) 


restriction(pn allValuesFrom C) | 


restriction(pn someValuesFronm C) | 


| 
restriction(pn maxCardinality(n)) | 
C ::= cn | owl:Thing | owl:Nothing | Res 
intersection0f(C1,...,C,) | unionO£(C1,...,Czx) | 
complementOf(C) | oneDf(in,,..., ing) 


and adding the following sentences: 


EnumeratedClass(cn in1,..., ing) | SubClassOf(C1, C2) | 
DisjointClasses(C1,...,C,) | EquivalentClasses(C1,...,Cx) | 
pn.domain(C, ...C,) | pn.range(C1...Cx) | 

Individual(in type(C1),...,type(Cx)) | in.value(pn in) 


The semantics of the new expressions and the satisfaction relation for the new sentences 
are defined as expected. 
eas 


Proposition 3. a) There is a conservative comorphism from OWLLite to OWLD@SLOQ. 


b) There is a conservative comorphism from OWLDeSLOg | to OWLDeSLOg. 


4.5 OWL Full Logic OWLFull 


OWL Full adds a number of features on top of OWL DL and also removes some re- 
strictions. The vocabulary no longer needs to be separated. This means an identifier can 
denote a class, an individual and/or a property at the same time. 

Let X be a name denoting a class and a property in the same ontology © = 
((CN, PN,IN),F),ie., X € CNM PN, and let I be a (X, F)-model. For the moment, 
we denote with X:C'N the occurrence of X as a class and with X:PN the occurrence 
of X as a property. We have S7(X:CN) = Sy(X:PN) = (X), extr(X:CN) C Ry, 
and ext)(X:PN) C Ry x Ry. Since X denotes just one entity, we relate the two sets 
by means of a bijection rdefi(X) : extr(X:PN) — exty(X:CN). We may think that 
rdef;(X)(r1, r2) is the URL address where the pair (r1, r2) is defined as an instance 
of exty(X:PN). If X denotes a class (property) and an individual, then its meaning as 
an individual is given by 5;(X ) and its meaning as class (property) is extr( X). 

Also, keywords of the language can be used in place of classes, properties and 
individuals, and restrictions. For instance, we may assume that subClassOf and 
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subPropertyOf are in PN. Then for X, Y denoting both classes and properties, 
we have subClassOf(X, Y) iff SubPropertyO£(X, Y); this is semantically ex- 
pressed by (S1(X: CN), (Y :CN)) € exty(SubClassOf) iff (S7(X:PN), (Y :PN)) 
€ extr(SubProperty0f). 

We skip the formal definition of OWL Full here. The new features are added in a 
similar way to other languages. In the definition of the signatures we remove the re- 
striction as the sets CN, PN, IN to be pairwise disjoint. The corresponding restriction 
from the definition of models is also removed. 

The original definition of OWL Full is given directly over RDF Schema. Here 
we refer an RDF independent definition for OWL Full. The fact that OWL Full is built 
over RDF Schema is given by the following result. 


see ——~ th 
Proposition 4. There is a conservative comorphism from RDFS to OWLFull . 





It is easy to see that we cannot embed RDFS in OWLDeSLOg. For instance, triples 
like (rdf:type, rdf:type, rdf:Property) cannot be encoded in OWLDeSLOg but can 
be expressed as a sentence in OWLFull. 

If (X,F) is an ontology in OWL DL, then, syntactically, it is also an ontology in 


OWL Full. However, the class of (2, F)-models in OWLFull is richer than that of 
(x, F)-models in OWLD@SLOg. The reason is that in OWL Full we removed some 
model constraints which are present in OWL DL. Hence we have the following result: 


Proposition 5. There is not an embedding comorphism from OWLDeSLOg to OWLFull. 


The theorem above has a drastic consequence: it could be unsound to use OWL Full 
reasoners for OWL DL ontologies. This is refereed in literature as inappropriate layer- 


ing [5]. 


The relationships between SW logics are expressed by the following diagram: 


OWLFull OWLDeSLog owLDesLog 


SIAS 


OWLLite a= OWLLite 





Between OWLFull and OWLDeSLOg we can define only a “syntactical comorphism” 
—_—. ls eae: 
consisting of an inclusion functor Sign(OWLDeSLog) — ® : Sign(OWLFull) and a 


natural transformation a : ®;sen(OWLDe@SLOg) — sen(OWLFull). 
5 RDF Serialization of Semantic Web Logics 
In this section, we define the RDF serialization of the SW languages discussed in the 


previous section. In terms of institution theory, an RDF serialization is a comorphism 
(encoding) from the source language to an institution built over an RDF theory as in 
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Section[2.2] Since the corresponding theories are related by morphisms, we get an in- 
dexed institution. This approach results in a much clearer understanding of the relation- 
ship among the various languages, as is shown at the end of the section. 


5.1 RDF Serialization of OWL Lite 


We define the theory OWLLM = (OWLLMVoc, Towrrm), where OWLLMVoc is RDFSVoc to- 
gether with an enumerable set of anonymous names and the symbols 


owl:allValuesFrom, owl:Class, owl:equivalentClass, 
owl:equivalentProperty, owl:hasValue, owl:inverseOf, 
owl:minCardinality, owl:ObjectProperty, 
owl:SymmetricProperty, owl:TransitiveProperty, 
owl:Restriction, owl:onProperty, 

owl:allValuesFrom, owl:hasValue, owl:minCardinality 


and Towtim is defined as Towtim = Trors U 


{ 


owl:Class, rdfs:subClassOf, rdf:Class), 
owl:allValuesFrom, rdf:type, rdf:Property), 
owl:allValuesFrom, rdfs:domain, rdf:Property), 
owl:equivalentProperties, rdf:type, rdf:Property), 
owl:equivalentProperties, rdfs:domain, rdf:Property), 
owl:equivalentProperties, rdfs:subPropertyOf, 
rdfs:subPropertyOf), 
(owl:ObjectProperty, rdf:type, rdfs:Class), 
(owl:inverseOf, rdf:type, rdf:Property), 


} 


The anonymous names are used in translating OWL sentences into conjunctions of 
triples [20]. As for RDF and RDF Schema, we suppose that there is a given set RowLLM 
of OWL Lite~ resources and a function Sow iim : OWLLMVoc — Rowttm which asso- 
ciates a resource to each OWL Lite~ symbol, and satisfies Sow LLM |gprsvoc = SRDFS.- 

For each theory such that there is a theory morphisms f : OWLLM —> (RR, T), 
we define the model constraint | RR, T],y,,,, Obtained by strengthening [RR, T] 
with the following conditions. If I € [RR, T] then: 


RDFS 


OWLLM ° 


Ry includes Row iim and the restriction of Sy to OWLLMVoc coincides with Sow ium. 
exty(owl:Class) N exty(owl:0bjectProperty) = 0) 
(Va, y)(x, owl:Class) € exty(rdf:type) A (y, £) € exty(rdf:type) > 
((y, owl:Class) ¢ exty(rdf:type)) ^ 
((y, rdf:Property) ¢ exty(rdf:type)) 
(Va, y)(a, y) € exty(owl:equivalentClass) > 
(x, Sy(owl:Class)) € exty(rdf:type) A 
(y, St(owl:Class)) € exty(rdf:type) A exty(x) = exty(y) 
(Vu, w,v)(w, Sp(owl:Restriction)) € ext;(rdf:type) A 
(w, u) € exty(owl:onProperty) A (w, v) € exty(xowl:allValuesFrom) > 
exty(w) = {x | (x, y) € exty(u) > y € exty(v)} 
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- (Vu, v, w)(w, Sy(owl:Restriction)) € ext;(rdf:type) A 
(w, u) € exty(owl:onProperty) A (w, v) € exty(owl:hasValue) > 
exty(w) = {x | (x, v) € exty(u)} 
- (Vu, w, y)(w, Si(owl:Restriction)) € ext;(rdf:type) A 
(w, u) € exty(owl:onProperty) A (w,0) € exty(owl:minCardinality) > 
exty(w) = {a | #({(x, y) € exty(u)}) > 0 
- (Vu, w, z, y)(x, y) € exty(owl:inverseDf) > 
(x, Sy(owl:ObjectProperty)) € exty(rdf:type) ^ 
(y, St(owl:ObjectProperty)) € exty(rdf:type) ^ 
(w, v) € exty(x) = (v, w) € exty(y) 
( 


(x,y) € extr(u) = (y, £) € exty(u) 
— (Vu, zx, y,z)(u, Sı(owl:TransitiveProperty)) € ext;(rdf:type) A 
(x,y) € extr(x) A (y, z) € extr(z) > (a, z) € exty(x) 
— (Va, y)(2, y) € extr(owl:equivalentProperty) > 
(x, Sy(xdf:Property)) € exti(rdf:type) A 
(y, Si(rdf:Property)) € exty(rdf:type) A exty(x) = exty(y) 


The second and the third conditions say that the vocabulary is separated: a resource can- 
not be a class, an individual and/or a property at the same time. The last three conditions 
give the intended meaning of the symmetric property, transitive property, and equiva- 
lent property, respectively. The other conditions give semantics to the new syntactical 
constructions. We denote by OWLLM the institution generated by the theory OWLLM and 


the model constraint [|] owym using the method presented in Section[2.2] 


Proposition 6. There is a conservative comorphism (®, 3, aœ) from OWLLite to 
_——~_th 
(OWLLM )^. 


Here is a brief description of the comorphism given by Proposition|6] 
® : Sign(OWLLite) —> Sign((GWLLM'")4) is defined as follows. If E = (CN, PN, IN) 
is an OWL Lite~ signature, then (X) is © : OWLLM —> (RR, T), where RR = 
OWLLMVoc U CN U PN U IN, and T includes Towrty together with: 
a triple (cn, rdf:type, owl:Class) for each cn € CN, 
a triple (pn, rdf:type, owl:0bjectProperty) for each pn € PN, and 
a triple (in, rdf:type, rdf:Resource) for each in € IN. 
B: Ð; Mod( (OHLIN )^) => Mod(owLtite_) is defined as follows. If I’ is a &(X)- 
model, then Gy(I’) = I, where Ry = Ry, Sy(name) = Sy(name) for each name € 
CN U PN U IN, exty(pn) = exty (pn), and 
exty(cn) = {a | (a, Sy(cn)) € exty (rdf:type)}. 
Qa: sen(OWLLite_) => ©; sen((OWLLM )^) is given such that ay associates with each 
OWL Lite™ syntactical construction a set (conjunction) of triples similar to that defined 
in [RO]. If È is an OWL Lite~ -signature and M a }-model, then M can be extended 
to &(X)-model by giving semantics to symbols in OWLLMVoc according to triples in 
Towium- 
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5.2 RDF Serialization of OWL Lite 


We define the theory OWLL = (OWLLVoc, TowLL), where OWLLVoc is OWLLMVoc together 
with 


owl:Thing, owl:Nothing, owl:FunctionalProperty, 
owl:SameIndividual, DifferentIndividuals, owl:someValues, 
owl:maxCardinality 


and Tow. is defined as Tow = Towtim U 


{ 


(owl:Thing, rdf:type, rdfs:Class), 

(owl:Nothing, rdf:type, rdfs:Class), 

(owl:FunctionalProperty, rdf:type, rdfs:Class), 

(owl: InverseFunctionalProperty, rdf:type, rdfs:Class), 

(owl: InverseFunctionalProperty, rdfs:subClassoOf, 
owl:ObjectProperty) , 

(owl:sameAs, rdf:type, rdf:Property), 


} 


As for OWL Lite~, we suppose that there is given a set Row ii of RDF Schema 
resources and a function Sow ii : OWLL — Row. which associates a resource to each 
OWL Lite symbol, and satisfies Sow tloyttuvoc = SOWLLM- 

For each theory such that there is a theory morphisms f : OWLL —> (RR, T) we de- 
fine the model constraint | RR, T],,,,, obtained by strengthening [RR, T],,,,,, with 
the following conditions. If I € [RR, T] then: 


OWLL? 


— Ry, includes Row tt and the restriction of S; to OWLLVoc coincides with Sow... 
— I satisfies the restrictions expressing the intended meaning of the new features. 


We denote by OWLL the institution generated by the theory OWLL and the model con- 
straint [|_]] using the method presented in Section[2.2] 


OWLL 


—— h 


. . . . -~tl VAN 
Theorem 1. There is a conservative comorphism from OWLLite to (QWLL )^. 


5.3 RDF Serialization of OWL DL— 


We define the theory OWLDLM = (OWLDLMVoc, Towrpim), where the vocabulary 
OWLDLMVoc is OWLLMVoc together with 


owl:SubClassOf, owl:intersectionOf, owl:unionOf, owl:oneOf, 
owl:someValues, owl:hasValue 


and TowLDLM is defined as TowLDLM = Tow.im U 


{ 


owl:intersectionOf, rdf:type, rdf:Property), 
owl:unionoOf, rdf:type, rdf:Property), 
owl:oneOf, rdf:type, rdf:Property), 
owl:oneOf, rdfs:range, rdf:List), 
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As usual, we suppose that there is a given set Row_p_m Of RDF Schema resources 
and a function SowLpLM : OWLDLM — RowLprm which associates a resource to each 
OWL DL- symbol, and satisfies SowLDLM |owLLMvoc = SOWLLM- 

For each theory such that there is a theory morphisms f : OWLDLM > (RR, T) we 
define the model constraint [RR, T]owınım Obtained by strengthening [| RR, T] 
with the following conditions. If I € [RR, T] then: 


OWLLM 
OWLDLM ° 


— Ry, includes Rowtpim and the restriction of Sg to OWLDLMVoc is SOWLDLM.- 
— I satisfies the restrictions expressing the intended meaning of the new features. 


We denote by OWLDLM the institution generated by the theory OWLDLM and the model 


constraint [|] owni Using the method presented in Section[2.2] 


ges. — th 
Theorem 2. There is a conservative comorphism from OWLDesLog to (OWLDLM )^. 


5.4 RDF Serialization of OWL DL 


We define the theory OWLDL = (OWLDLVoc, Towrp.), where OWLDLVoc is OWLLVoc to- 
gether with 


owl:DepricatedClass, owl:DisjointClasses, owl:SubClassOf, 
owl:Functional, owl:InverseFunctional, owl:Transitive, 
owl:SameIndividual, DifferentIndividuals, owl:someValues, 
owl:Thing, owl:Nothing, 

owl:intersectionOf, owl:unionOf, owl:complementoOf, 
owl:oneOf, owl:someValues, owl:hasValue, owl:maxCardinality 


and Towtp is defined as Towtp. = Towt U 


{ 
(owl:intersectionOf, rdf:type, rdf:Property), 
(owl:intersectionOf, rdfs:domain, owl:Class), 
(owl:equivalentClass, rdf:type, rdf:Property), 
(owl:disjointWith, rdf:type, rdf:Property), 

} 


As usual, we suppose that there is given a set Rowrp of RDF Schema resources 
and a function Sow ip. : OWLDL — Row xp. which associates a resource to each OWL 
DL symbol, and satisfies SowLDLM |owLLvoc = SOWLL- 

For each theory such that there is a morphisms f : OWLDL — (RR, T) we define 
the model constraint [RR, T],,,,,, obtained by strengthening [[RR, T]],,,, with the 
following conditions. If I € [RR, T] then: 


OWLDL’ 
— Ry, includes Row tpr and the restriction of Sg to OWLDLVoc coincides with Sow tp_. 
— I satisfies the restrictions expressing the intended meaning of the new features. 


We denote by OWLDL the institution generated by the theory OWLDL and the model con- 
straint [|] using the method presented in Section[2.2] 


OWLDL 


_—— _——~_th 
Theorem 3. There is a conservative comorphism from OWLD@SLOg to (OWLDL )^. 
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5.5 RDF Serialization of OWL Full 


The theory OWLF = (OWLFVoc, Towrr) is defined as follows. The vocabulary OWLFVoc 
includes OWLDLVoc and the symbols corresponding to the new features. Similarly, Towrr 
includes Town together with triples restricting the use of the new symbols as intended 
and triples expressing the equality of the parts of the OWL universe with their analogues 
in RDF Schema. 


RowLr and SowLr are defined as usual. The model constraint [_]] includes: 


OWLF 


— The restriction corresponding to Rowrr and Sow.r. 

— Restrictions expressing the intended meaning of all the features. 

— Restrictions that force the parts of the OWL universe to be the same as their ana- 
logues in RDF. 


The vocabulary separation restriction is not included in [|_]],,,, p- 


; A f SE th, a 
Theorem 4. There is a conservative comorphism from OWLFull to (QWLF )^. 


5.6 Summing Up 


All institutions we defined in this paper and their relationships are represented in Figure 
The lower side includes the RDF indexed institution and it gives the semantics for 
RDF layer in the Berners-Lee’s stack. It is worth to note that all arrows are institution 
morphisms; hence we may define the semantics of the layer as being the Grothendieck 
institution defined by this indexed institution. The upper side includes the institutions 
corresponding to the SW languages and their relationships expressed as comorphisms. 
The Grothendieck institution defined by this indexed institution gives the semantics 
for ontology layer. The semantics of the layering of web ontology languages on the 
RDF framework is given by a comorphism of Grothendieck institutions. Note that the 
embedding of RDF Schema in OWL Full is not a component of this comorphism. 


6 Conclusion 


The multitude of languages causes certain confusion in the Semantic Web community as 
they are based on different formalisms (description logics, Datalog, RDF Schema, etc.). 
A careful and thorough investigation of the relationship among the various languages 
will certainly reveal subtle differences among them. 

Institutions and institution morphisms were developed to capture the notion of “log- 
ical systems” and relate software systems regardless of the underlying logical system. 
Hence, it is natural to use institutions to represent the various Semantic Web languages 
(including RDF and RDF Schema) and study their relationship using institution mor- 
phism. 

In this paper, based on RDF(S), we define indexed institutions for RDF framework 
layer and ontology layer. An overall relationship among all these languages can be 
seen in Fig.[2] The figure shows that the institution approach can precisely capture the 
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Ontology 
~ th co a —_. co = 
OWLFull ~— OWLFull OWLDeSLOg «—— OWLDeSLOg 
layer 
Co © 
co 
o o aa co SS 
S 8 OWLLite «————. OWLLite 
co 
tha ~—— th, , ———_ tha 
co (OWLF ) (QWLDL )^ ———> (OWLDLM ) 8 
RDFS 


——.th ——th 
(LL ^ —— (GWLLM')” 


layer 


PA ——th 
RDFS (RDFS )’ 





Fig. 2. RDF serialization 


relationship among the various languages. The work presented in this paper opens up a 
new practical application domain for the institutions theory. 

One future work direction is to further investigate the relationship of various on- 
tology languages with regard to their respective underlying logical systems. Languages 
such as the OWL suite (OWL Lite, DL and Full) are based on description logics and 
they assume an “Open World Assumption”. On the other hand, languages such as OWL 
Flight and WRL are based on logic programming and they assume a “Closed World 
Assumption’. The interoperability of these kinds of languages has been intensively dis- 
cussed but is still an open question. We believe institution theory can help to clarify 
this issue by establishing links at the logical level. It is also of interest to investigate 
the properties of the indexed institution like theory colimits, liberality, exactness, inclu- 
sions, and how the design of tools for SW can benefit from these properties. 
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Abstract. We propose to use Grothendieck institutions based on 2- 
categorical diagrams as a basis for heterogeneous specification. We prove 
a number of results about colimits and (some weak variants of) exact- 
ness. This framework can also be used for obtaining proof systems for 
heterogeneous theories involving institution semi-morphisms. 


1 Introduction 


“There is a population explosion among the logical systems used in com- 
puter science. Examples include first order logic, equational logic, Horn 
clause logic, higher order logic, infinitary logic, dynamic logic, intuition- 
istic logic, order-sorted logic, and temporal logic; moreover, there is a 
tendency for each theorem prover to have its own idiosyncratic logical 
system. We introduce the concept of institution to formalize the informal 
notion of ‘logical system’.” 


This famous quote from Joseph Goguen’s and Rod Burstall’s seminal paper 
introducing institutions lead, in its consequences, also to the introduction of 
Grothendieck institutions by Razvan Diaconescu [5], which provide the semantic 
basis for heterogeneous specifications, i.e. the involvement of a multitude of 
logical systems within a single specification. 

While the properties of Grothendieck institutions and their interaction with 
colimits, exactness, liberality, Craig interpolation etc. is well-studied now (cf. the 
forthcoming book [4]), the present theory of Grothendieck institutions still does 
not answer certain practical problems. During the development of the heteroge- 
neous tool set (HETS) [15[17], a parsing, static analysis and proof management 
tool for heterogeneous specifications, we have encountered the following prob- 
lems: 


— often there is a plethora of possible translations between two given institu- 
tions, making choice difficult for the user; 

— often premises for theorems about Grothendieck institutions do not hold for 
some of the institution involved — however, failure of a premise just for one 
institution usually destroys applicability of a theorem; 

— also, the premises needed for institution (co)morphisms do not hold in all 
cases; 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 124-4149] 2006. 
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— finally, this means that the applicability of theorem proving for structured 
specifications is limited for Grothendieck institutions, and hence for het- 
erogeneous specifications. 


We introduce two ideas that may help solving these problems: the use of in- 
stitutional 2-cells, and the weakening of exactness properties to quasi-exactness. 
We prove a number of properties of these and discuss examples. Proofs can be 
found in the appendix. 


2 Institutions 


Let CAT be the category of categories and functors[] 
Definition 1. An institution J = (Sign, Sen, Mod, =) consists of 





— a category Sign of signatures, 

— a functor Sen: Sign — Set giving, for each signature X, the set of sen- 
tences Sen( X), and for each signature morphism o: X — X', the sentence 
translation map Sen(c):Sen(’) —> Sen(”), where often Sen(c)(y) is 
written as a(y), 

— a functor Mod: (Sign)? — CAT giving, for each signature X, the cate- 
gory of models Mod(), and for each signature morphism o: X — 3", the 
reduct functor Mod(c): Mod(’) — Mod(), where often Mod(c)(M’) 
is written as M'|,, 

— a satisfaction relation Es C |Mod()| x Sen(2) for each X € |Sign|, 





such that for each o: X — S” in Sign the following satisfaction condition holds: 


M' =y olp) & M'|, Es P 











for each M’ € |Mod(")| and p E€ Sen( X). 











Institutions can alternatively, and more succinctly, be characterized as func- 
tors into a certain category of “twisted relations” [10], called “rooms” in [9]: 
An institution room (S, M, |) consists of 





— a set of S of sentences, 
— a category M of models, and 
— a satisfaction relation = C |M| x S. 





Rooms are connected via corridors (which model change of notation within 
one logic, as well as translations between logics). 
An institution corridor (a, B): (S1, M1, =1)— (S2, M2, 2) consists of 








— a sentence translation function a: S1 — S2, and 
— a model reduction functor 8: Ma — Mı, such that 


1 Strictly speaking, CAT is not a category but only a so-called quasicategory, which 
is a category that lives in a higher set-theoretic universe |11|. 
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Mə F2 a(y1) + (M2) E1 91 


holds for each Mə € |Mg| and each yı € Sı (satisfaction condition). 
Now, an institution can equivalently be defined to be just a functor T: Sign — 
InsRoom (where Sign is the category of signatures). 








Example 2. The institution FOLS of many-sorted first-order logic with equal- 
ity. Signatures are many-sorted first-order signatures, i.e. many-sorted algebraic 
signatures enriched with predicate symbols with arities. Signature morphisms 
map signature symbols in a coherent way. Models are many-sorted first-order 
structures, and model morphisms are standard algebra homomorphisms that 
preserve the holding of predicates. Model (morphism) reduction is done by re- 
naming model (morphism) components. Sentences are first-order formulas, and 
sentence translation means replacement of the translated symbols. Satisfaction 
is the usual satisfaction of a first-order sentence in a first-order structure. 














Example 3. The institution Eq- of equational logic is the restriction of FOLS 
to signatures without predicates, and (universally quantified) equations as the 
only sentences. 














Example 4. The institution PFOL™~ of partial first-order logic with equality. 
Signatures are many-sorted first-order signatures enriched by partial function 
symbols. Models are many-sorted partial first-order structures. Sentences are 
first-order formulas containing existential equations, strong equations, defined- 
ness statements and predicate applications as atomic formulas. Satisfaction is 
defined using total valuations of variables, while valuation of terms is partial 
due to the existence of partial functions. An existential equation holds if both 
sides are defined and equal, whereas a strong equation also holds if both sides 
are undefined. A definedness statement holds if the term is defined. A predicate 
application holds if the terms contained in it are defined, and the correspond- 
ing tuple of values is in the interpretation of the predicate. This is extended to 
first-order formulas as usual. Moreover, signature morphisms, model reductions 
and sentence translations are defined like in FOLS. 














Example 5. The CASL institution extends PFOL~ with subsorting and induc- 
tion (for datatypes), see [I4[3] for details. CASL has, among others, a modal logic 
extension MODALCASL [15] and a coalgebraic extension COCASsL [18]. 














Example 6. There is an institution PLNG of a programming language [21]. It 
is built over an algebra of built-in data types and operations of a programming 
language. Signatures are given as function (functional procedure) headings; sen- 
tences are function bodies; and models are maps that for each function symbol, 
assign a computation (either diverging, or yielding a result) to any sequence of 
actual parameters. A model satisfies a sentence iff it assigns to each sequence 
of parameters the computation of the function body as given by the sentence. 
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Hence, sentences determine particular functions in the model uniquely. Finally, 
signature morphisms, model reductions and sentence translations are defined 
similarly to those in FOLS. 














Institution morphisms [LOJ7 relate two given institutions. A typical situation 
is that an institution morphism expresses the fact that a “larger” institution is 
built upon a “smaller” institution by projecting the “larger” institution onto 
the “smaller” one. Dually, institution comorphisms [7| typically express that an 
institution is included, or encoded into another one. 

Using the notation of institutions as functors, given institutions J1: Sign, — 
InsRoom and Jz: Sign, —> InsRoom, an institution morphism (W, u): I — Ip 
consists of a functor W: Sign, — Sign, and a natural transformation pu: [2oW — 
I. (Alternatively, we split u into two natural transformations, denoted by a and 
B). By contrast, an institution comorphism (®, p): Iı — Ig consists of a functor 
@: Sign, — Sign, and a natural transformation p: 1) — Iz 0 Y. 

Together with obvious identities and composition, this gives us the cate- 
gory Ins (ColIns) of institutions and institution (co)morphisms. An institution 
semi-(co)morphism is like an institution (co)morphism, but without the sentence 
translation component (and hence also without the satisfaction condition). 


Example 7. There is an institution morphism going from first-order logic with 
equality to equational logic. A first-order signature is translated to an algebraic 
signature by just forgetting the set of predicate symbols; similarly, a first-order 
model is turned into an algebra by forgetting the predicates. Sentence translation 
is just inclusion of equations into first-order sentences. 














Example 8. There is an institution semi-morphism toC'ASL from PLNG to CASL 
[21]. It extracts an algebraic signature with partial operations out of a PLNG- 
signature by adding the signature of built-in data types and operations of the 
programming language. For any function declared, any PLNG-model determines 
its computations on given arguments, from which we can extract a partial func- 
tion that maps any sequence of arguments to the result of the computation (if 
any). These are used to expand the built-in algebra of data types and opera- 
tions of the programming language with an interpretation for the extra function 
names in the signature obtained. 














Example 9. There is an institution comorphism going from equational logic to 
first-order logic with equality. An algebraic signature is translated to a first- 
order signature by just taking the set of predicate symbols to be empty. Sentence 
translation is just inclusion of equations into first-order sentences. A first-order 
model with empty set of predicates is translated by just considering it as an 
algebra. 














Example 10. Similarly, there are obvious inclusion comorphisms from CASL to 
MOoDALCASL and CoCasL, see [15]. 
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Example 11. Define an institution comorphism going from partial first-order 
logic with equality to first-order logic with equality as follows: A partial first- 
order signature is translated to a total one by encoding each partial function 
symbol as a total one, plus a (new) unary predicate D (“definedness”) and a 
(new) function symbol L (“undefined”) for each sort (this means that L and 
D are heavily overloaded). Furthermore, we add axiom¢ stating that D does 
not hold on L, and that (encoded) total functions preserve (“totality”) and re- 
flect (“strictness”) D, while partial functions only reflect D (and the holding of 
predicates implies D to hold on the arguments). Sentence translation is done by 
replacing all partial function symbols by the total functions symbols encoding 
them, replacing strong equations t = u by (D(t) V D(u)) > t = u, existence 
equations by conjunctions of the equation and the definedness (using D) of one 
of the sides of the equation, replacing definedness with D, and leaving predicate 
symbols as they are. For a given total model of the translated signature, we 
just take as carriers of the partial model the interpretations of the definedness 
predicates in the total model, while the total functions are restricted to these 
new carriers, yielding partial functions. 














3 Institution (Co)Morphism Modifications 


A typical experience with using the heterogeneous tool set [I5[17] is the following: 
for some specification, you want to prove a theorem, and hence want to see 
a list of its possible translations (along (co)comorphisms) into tool-supported 
institutions. Now even with a small diagram of institutions, the list can become 
quite large, because also composites should be shown (see Fig. [I] for a menu of 
such translations). Now such lists generally bear a lot of redundancy, since two 
different translation paths, though differing as (co)morphisms, lead to essentially 
same results, as the following example shows: 


Example 12. There are two ways to go from equational logic to first-order logic: 
one is the obvious subinstitution comorphism pı from Example p] the other one 
is the composition p2 of the obvious subinstitution comorphism from equational 
logic to partial first-order logic composed with the encoding of partial first-order 
logic into first-order logic from Example mi] These comorphisms are different: p2 
adds some (superfluous) coding of partiality. Yet, for e.g. the purpose of re-using 
proof tools, pı and pz are essentially the same. 


In this context, the notion of modification helps. 

In order to ensure that the difference between two translations really is 
inessential, a crucial property of modifications is that they do not lead to identi- 
fications of different sentence or model translation maps. Hence, we strengthen 
the original notion from |5| to discrete modifications: 


2 Hence, strictly speaking, this comorphism is a so-called simple theoroidal one, see 
for details. 

3 Actually, since the latter is a simple theoroidal comorphism, we should take both to 
end in FOL”, the institution of FOL-theories. 
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[i _CASLEq CASL AN 
|[CASLZ@CASL £ CASL -> CoCASE 

|[CASLZOSPCASL = CASE -» CspCASt 

||CASL2MaSCASL : CASL -> Has CASI 

(CASLZ2isabelleHOL : CASL -> Isabelle 

||CASL2Modad : CASL -> Modal 

CASLZPCFOL : CASL -> CASL 

PCFOLZFOL : CASL -> CASE 
|CASL2CaCASL;CoCASLZIsabelleHOL : CASL -> Isabete 
||CASL2CspCASL:CapCASL2 Modal : CASL -> Modal 

||CASL2 Has CASL:Has CASLZ Has CASL : CASL -> Has CASL 
||CASL2HasCASL;HosCASLZ Haskell : CASL -> Maskell 
||CASL2Modar;Modale CASE : CASL -» CASL 
||CASL2PCFOL;CASLZCOCASL : CARL -> CoCARL = 
CASLZPCFOL;CASL2CspCASL : CASL -> CspCASL 
||CASL2PCFOL-CASL2HasCASL : CASL -> HasCASL 
|| CASLZPCFOL;CASLZModal : CASL -> Modal 
||CASL2PCFOL;PCFOLZFOL : CASL -» CASE 
||PCFOL2FOL;CASL2GnCASL = CASL -> CoCASE 

|| PCFOL2ZTOL;CASL2CapCASL : CASL -> CapCASL 
||PCPOL2TOL;CASL2HasCASL : CASL -> HasCASL 
PCFOLZFOL;CASLZtsahemeHOL : CASL -> tsabeme 
| PCFOLZFOL:CASLZ MOGA : CASL -+ Modal 
|PCFOLZFOL;CASLZPCFOL : CASL -> CASL 
||CASL2CspCASL;CspCASL2Modal;Modal2CASL : CASL -> CASL 

 [CASLZMas CASL; Has CASL2Z Has CASL-Hax CASLZ Haskell : CASL -> Haskell 

|| CASLZ Has CASU Hos CASLZ Haskell: task tsabelleHOLCF : CASL -> tsabette 
||CASL2 Modal; Modal? CASL;CASL2COCASL : CASL -> CoCASL 

|| CASL2Modal;Modal2 CASL:CASL2 CspCASL : CASL -> CspCASL 

|| CASL2Modal:Modul2 CASL-CASL2 Has CASL : CASL -> HaxCASL 
||CASL2Modut;Mostalz CASL;CASLZPCFOL : CASL -> CASL 
||CASL2PCFOL;CASL2GnCASL;COCASLZtsabelleHOL : CASL -» tsabetle 
||CASL2PCFOL;CASL2CspCASI ;CxpCASI2Modal = CASL -> Modal 
(CASLZPCFOL;CASL2ZHas CASL:HasCASLZHaSCASL : CASL -> HasCASL 
||CASL2PCFOL;CASL2MasCASL:HaxCASL2 Haskell = CASL -> Maskell 

| CASLZPCFOL,CASLZMoual:ModalZ CASL : CASL -» CASL 

|| CASL2PCFOL;PCFOLZFOL;CASLZGnCASL : CASL -> CoCASL 
||CASLZ2PCFOL;PCFOLZFOL;CASLZCspCASL = CASL -> CspCASt 

| CASL2PCFOL:PCI OL2FOL;CASL2HasCASL : CASL -> HasCASL 
||CASL2PCFOL-PCFOL2ZFOL;CASL2tsabelleHOL : CASL -> teabetle 
||CASLZPCFOL;POFOLZFOL;CASLZModal : CASL -> Modal 
||PCFOLZFOL;CASLZ0nCASL;CoCASE?IsabelleHOL : CASI -» tsabeme 
||PCFOL2FOL;CASL2CspCASI ;CspCAS17Modal = CASI -> Modal 
||PCFOL2ZPOL;CASL2 Has CASL Has CASLZMaSCASL : CASL -> HasCASL 

|| PCPOL2TOL;CASL2 Has CASL:HasCASLZ Haskell : CASL -> Haskell 
||PCFOLZFOL;CASL2 Modal; MosialZ CASL : CASL -> CASL 
||PCFOLZFOL;CASLZPCFOL;CASLZQnCASL : CASL -> COCASL 
||PCFOL2FOL;CASL2PCFOL;CASL2CspCASL = CASL -> CspCASt 

|| PCFOL2ZTOL;CASLZPCFOL;CASLZMasCASL ; CASL -> HasCASL 
||PCPOL2TOL;CASLZPCFOL;CASL2Modal : CASL -> Modal 
||CASL2CapCASL:CspCASLZ Modal; Modaz CASL;CASLZODCASL : CASL -> CUCASL 
|| CASL2 CspCASL:CspCASLZ Modal: Modal? CASL;CASLZHASCASL : CASL -> Has CASL 
||CASL2CspCASL ;CspCASL2 Modal; Mortal? CAS ;CASL? Modal : CASI. -> Modal 








Fig. 1. Dozens of translation possibilities for a CASL theory in HETS (from a 
logic graph without comorphism modifications; using modifications, the number 
of possible translations can be greatly reduced). 


Definition 13. Given institution morphisms (W, u): I, — Ip and (W’, p’): I — 
In, a discrete institution morphism modification 6: (W, u) — (W’, u’) is just a 
natural transformation 0:Y —+W’ such that u = p o (Ig - 0). Similarly, given 
institution comorphisms (®, p): ly — Ip and (P', p'): Iı — In, a discrete insti- 
tution comorphism modification 6: (®, 9) — (P', p') is a natural transformation 
0:P— P such that (2-0) op =p’. 


Together with obvious identities and compositions, modifications can serve as 
2-cells, leading to 2-categories Ins and Colns. o 





In [5/4], a weaker notion of institution morphism modification has been intro- 
duced, involving an additional natural transformation on the side of the models. 
We have not found this extra generality of practical use and hence work with the 
above stronger notion of discrete modification. However, since we will not use 
any non-discrete modification, we will omit the qualification of being discrete 
henceforth. 
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Example 14. Consider the comorphisms from Example [2] 


ot 
OO 


PFOL 


Eq 


The comorphism modification 4: pı — p2 is just the pointwise inclusion of an 
algebraic signature viewed as first-order signature into the theory coding a partial 
variant of that signature. 














Modifications also interplay with amalgamation: 


Definition 15. Let p = (®,a, b): — In, pı = (®1,01, 61): — J and 
p2 = (2, a2, b2): Ig — J be three comorphisms. A lax triangle 


h 


I2 
of institution comorphisms and modifications is called (weakly) amalgamable, if 


B 
Mod” (5) a Mod” (81 (5)) 
[a Mod’ (05) 


Mod? (5) <È Mod” (62(3)) 














is a (weak) pullback for each signature X € |Sign"]. 


4 Colimits in Hom-Categories 


As a first result about the 2-categorical structure of CoIns, we examine colimits 
in the Hom-categories, which play a rôle for some results about the Grothendieck 
construction (see Prop. [22] below): 


Proposition 16. Given two institutions I and J, if J has pushouts of signa- 
tures, then the Hom-category CoIns(I, J) has pushouts as well. This generalizes 
to arbitrary non-empty colimits of connected diagrams. 
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Note that initial objects in Hom-categories CoIns(J,J) generally do not 
exist: an initial comorphism from J to J would have to translate I-sentences to 
J-sentences over the initial signature, thereby losing any specific reference to the 
signature, which generally destroys the satisfaction condition. 

The dual situation is better for initial objects: 

Proposition 17. Given two institutions I and J, if J has an initial signature 
with empty set of sentences and terminal model category, then the Hom-category 
Ins(J, J) has an initial object. 














However, pushouts in Ins(/, J) seem to exist only under rather strong addi- 
tional assumptions. 
We hence prefer to work with comorphisms in the sequel. 


5 Comorphism-Based Grothendieck Institutions 


Grothendieck institutions have been introduced by Diaconescu as a foun- 
dation for heterogeneous specification. The basic data for comorphism-based 
heterogeneous specification is a graph of institutions, comorphisms and modifi- 
cations. Remember from Sect. [i] that the modifications are needed because we 
want to express that certain compositions of comorphisms are the same. This 
means that we need to specify both compositions and modifications. We hence 
arrive at the following: 


Definition 18. Given an index 2-category Ind, a 2-indexed coinstitution is 
a 2-functor T: Ind* — ColInd4 into the 2-category of institutions, institution 
comorphisms and institution comorphism modifications. 














A 2-indexed coinstitution can be flattened, using the so-called Grothendieck 
construction. The basic idea here is that all signatures of all institutions are put 
side by side, and a signature morphism in this large realm of signatures consists 
of an intra-logic signature morphism plus an inter-logic translation (along some 
logic comorphism). The other components (sentences, models, satisfaction) are 
then defined in a straightforward way. 

The Grothendieck construction for indexed institutions has been described 
in [5]; we develop its dual here [I3]. In an indexed coinstitution Z, we use the 
notation Z? = (Sign’, Sen’, Mod’, |Ż) for Z(i), (84, p?) for the comorphism 
T(d), and T” for the modification Z(u). 


Definition 19. Given a 2-indered coinstitution T: Ind* —>+ Colns, define the 
Grothendieck institution Z* as follows: 


— signatures in TË are pairs (i, X), where i € |Ind| and X a signature in T’, 
— signature morphisms (d, o): (i, X1) — (j, X2) consist of a morphism d: j — 
i € Ind and a signature morphism o: P?(X1)— Xə in TŽ, 

composition is given by (d2,02) o (di, 01) = (dı 0 dz, a2 0 PL (01)), 


— T# (i, X) = T'(X), and T# (d,o) = T(X4) H poi > Tİ(Xə). 





| 














4 Ind* is the 2-categorical dual of Ind, where both 1-cells and 2-cells are reversed. 
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That is, the room Z*(i, X) (consisting of sentences, models and satisfaction) 
for a Grothendieck signature (i, X) is defined component-wise, while the corri- 
dor for a Grothendieck signature morphism is obtained by composing the cor- 
ridor given by the inter-institution comorphism with that given by the intra- 
institution signature morphism. We also denote the Grothendieck institution by 
(Sign*, Sen*, Mod*, H#). 

While the comorphism based Grothendieck construction nearly satisfies all 
of our needs, one problem remains. Sometimes, the Grothendieck construction 
makes too many distinctions between signature morphisms (cf. Fig. I). There- 
fore, we use the institution comorphism modifications to obtain a congruence on 
Grothendieck signature morphisms: the congruence is generated by 





(d',T%: &* (X) —64(S)) = (d, id: PX) — 64(5)) (1) 


relating morphisms from (i, X) to (j,®4(2)), for X € |Sign’|, d,d':j— i € Ind, 
and u : d => d! € Ind. We will later examine what is really added by the 


congruence closure. But first, let us state the following crucial property: 


Proposition 20. = is contained in the kernel of TF (considered as a functor). 














Let q7: Sign” —Sign* /= be the quotient functor induced by = (see [I2] 
for the definition of quotient category). Note that it is the identity on objects. 
We easily obtain that the functor Z* factors through the quotient category 
Sign* /=: 


Corollary 21. Z*: Sign” — InsRoom leads to a quotient Grothendieck in- 
stitution T# / =: Sign” / = — InsRoom. 














By abuse of notation, we denote T#/= by (Sign* /=, Sen*,Mod*, —*). 

When considering e.g. the comorphism going from partial first-order logic 
PFOL*~ to first-order logic FOLS, and the composite comorphism going from 
PFOL~ to CASL and then to FOLS, we end up in different comorphisms, which 
are however related by a comorphism modification. The above identification 
process in the Grothendieck institution now tells us that it does not matter 
which way we choose. 

In some cases, the congruence = can be described succinctly: 





Proposition 22. Assume that Ind* has cocones for diagrams of 2-cells of shape 
e ——> e ——e that are mapped to pushouts of 2-cells in CoIns. Then the 
congruence = defined above is explicitly given by 


(dio 0 Ty) = (d2,0 o Ty) 


for X € |Sign’|, d, dı, d2: j — i € Ind, o:61() — X' € Sign’ and u : d > 
dı, u2 : d => dg € Ind. 
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Note that according to Prop.[16] under relatively mild assumptions, pushouts 
of 2-cells in CoIns exist. Hence, the assumption of Prop. that Ind* has 
cocones for diagrams of 2-cells of shape èe ==> e ———e that are mapped to 
pushouts of 2-cells in CoIns is quite realistic. In particular, it is possible to add 
suitable cocones to Hom-categories in Ind* and interpret these as pushouts in 
Colns. 


6 Amalgamation and Exactness 


The amalgamation property (called ‘exactness’ in [6]) is a major technical as- 
sumption in the study of specification semantics [20] and is important in many 
respects. It allows the computation of normal forms for specifications [I2], and 
it is a prerequisite for good behaviour w.r.t. parameterization, conservative ex- 
tensions [6] and proof systems [I6]. 


Definition 23. A cocone for a diagram in Sign is called (weakly) amalgamable 
if it is mapped to a (weak) limit under Mod. I (or Mod) admits (finite) (weak) 
amalgamation if (finite) colimit cocones are (weakly) amalgamable, i.e. if Mod 
maps (finite) colimits to (weak) limits. This property is also called (weak) exact- 
ness, while (weak) semi-exactness is its restriction to pushout diagrams. 














More generally, given a diagram D: J — Sign’, a family of models (Mj) je\3| 
is called D-consistent if Mx|p(s) = Mj; for each 6:7 — k € J. A cocone 
(X, (uj) je\7|) over the diagram in D: J — Sign’ is called weakly amalgamable 
if for each D-consistent family of models (Mj) je),7|, there is a X-model M with 
M\|,,; = M; (j € |J|). If this model is unique, the cocone is called amalgamable. 


Proposition 24. An institution admits (weak) amalgamation iff each colimiting 
cocone in the category of signatures is (weakly) amalgamable. 














A further weakening just requires the existence of weakly amalgamable co- 
cones: 


Definition 25. Call an institution I quasi-exact if for each diagram D: J — 
Sign’, there is some weakly amalgamable cocone over D. Quasi-semi-exactness 
is the restriction of this notion to diagrams of shape è <—— e — >e. 


The importance of this definition lies in the fact that it 


1. interacts quite nicely with heterogeneous specification (the property holds 
for Grothendieck institutions under very mild and practically feasible as- 
sumptions), and it 

2. is a prerequisite for the (soundness and completeness of the) proof calculus 
of development graphs [15/16]. 


The theory of amalgamation and exactness in Grothendieck institutions for 
indexed institutions has been developed by Diaconescu [5]. Actually, the corre- 
sponding theory for indexed coinstitutions turns out to be much simpler [I3]. 
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Theorem 26. Let Z: Ind°? —> Colns be an indexed coinstitution and K be 
some small category such that 


1. Ind is K-complete, 

2. P1 is K-cocontinuous for each d:i— j € Ind, and 

3. the indexed category of signatures of T is locally K-cocomplete (the latter 
meaning that Sign’ is K-cocomplete for each i € |Ind|). 


Then the signature category Sign” of the Grothendieck institution has K -colimits. 














We cannot expect that this result directly carriers over to the quotient 
Grothendieck institution, since quotients of categories generally do not inter- 
act well with colimits. However, we can say something provided that we work 
with a quotient of the index category Ind: 


Proposition and Definition 27 Given a 2-category Ind, the relation of being 
in the same connected component of a Hom-category defines a congruence = on 
the objects of the Hom-categories, i.e. the morphisms of Ind. Ind/ = is the 
corresponding quotient 1-category. 














Lemma 28. Given a 2-indexed coinstitution T: Ind* — Colns, if (d2,01) = 
(d1,02) in Sign*, then dı = do. 














Proposition 29. Assume that Ind* has cocones for diagrams of 2-cells of shape 
e ——> e ——e that are mapped to pushouts of 2-cells in CoIns. Then the 
congruence = in Ind defined above is explicitly given by dı: i — j = d2: i — j 
iff there exist d: i—> j € Ind and u1: d—> d1, u2: d — də € Ind. 














Theorem 30. Let T: Ind* —~ColIns be a 2-indexed coinstitution such that 


1. Ind/= is K-complete for some small category K, 

2. each connected component (considered as a subcategory) of a Hom-category 
Ind(i,j) has a distinguished canonical weakly terminal object, such that these 
canonical objects are stable under composition, 

3. (d,o,) = (d,o2) in Sign” implies o1 = 02, 

4. Bt is K-cocontinuous for each d:i—+j € Ind, and 

5. the indexed category of signatures of T is locally K-cocomplete. 


Then the signature category Sign* /= of the quotient Grothendieck institution 
has K-colimits. (Note that assumptions [A and [B] are vacuous in case of discrete 
Hom-categories; we then get Theorem[26] as a special case.) 














By contravariance of Z, assumption 2] of the above proposition means that if 
institution comorphisms are linked by modifications, there is always a “smallest” 
comorphism that can be embedded into the other ones. This is quite realistic 
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in practice. However, it is not so realistic to assume that these smallest co- 
morphisms are stable under composition. For example, the composition of the 
smallest embedding of FOL™~ into CASL with the smallest embedding of CASL 
into second-order logic will give not given the smallest embedding of FOLS into 
second-order logic, but rather a more complex one. 

Assumption [3] basically means that the congruence does not identify signa- 
ture morphisms within one institution, i.e. that each signature category Sign’ 
is faithfully embedded into Sign* /=. This assumption is a reasonable and de- 
sirable property in practice. We record this explicity: 


Proposition 31. emb’: Sign’ — Sign*/ = is an embedding preserving colimits 
under the assumptions of Theorem [30 














Let us now come to exactness. We extend the notion of semi-exactness to 
comorphisms and to the indexed case. An institution comorphism (@, a, 8) is 
called (weakly) exact, if the naturality squares for 3 are (weak) pullbacks. An 
2-indexed coinstitution T: Ind* — Colns is called (weakly) locally semi-exact, if 
each institution I* is (weakly) semi-exact (i € |Ind|). Assuming that equivalence 
classes of 2-cells have canonical weakly terminal objects, Z is called (weakly) 
semi-exact if for each pullback in Ind/= 


d 
i [d1] jl 


[d2] | [e1] | 
j2 [e2] k 
the square 


dı 
5 





Mod’ (X) Mod’ (P4 (5)) 


Px | = | 
ps 


Mod? (##2(5)) <= Mod*(6* (84 (2))) = Mod! (5° (8° (3))) 


is a (weak) pullback for each signature X in Sign’, where canonical weakly 
terminal representatives are used|? 


Theorem 32. Assume that the 2-indexed coinstitution T: Ind* — Colns ful- 
fills the assumptions of Theorem[3Q| Then the quotient Grothendieck institution 
T#/= is (weakly) semi-exact if and only if 

1. T is (weakly) locally semi-exact, 

2. T is (weakly) semi-exact, and 

3. for all canonical weakly terminal d:i — j € Ind, in T° is (weakly) exact. 














5 It might be useful to weaken these notions in the way such that model morphisms 
are ignored. 
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Theorems [26] [30] and [32] already provide a good theoretical basis for hetero- 
geneous specification. However, in some cases, these theorems are not general 
enough: Given a diagram J — Ind, its limit must be the index of some insti- 
tution that can serve to encode (via comorphisms) all the institutions indexed 
by the diagram. While the existence of such an institution may not be a prob- 
lem (e.g. higher-order logic often serves as such a “universal” logic for coding 
other logics), the uniqueness condition imposed by the limit property is more 
problematic. This means that any two such “universal” institutions must have 
isomorphic indices and hence be isomorphic themselves. This might work well 
is some circumstances, but may not desirable in others: after all, a number of 
non-isomorphic logics, such as classical higher-order logic, the calculus of con- 
structions and rewriting logic have been proposed as such a “universal” logic [f] 

A related problen{] is that the assumptions of Theorem [B2] are too strong 
to be met for all practical examples. E.g. the CASL institution is not weakly 
semi-exact, and its encoding into HOLS [I4] is neither exact, nor does it have 
a cocontinuous signature translation. 

We hence now generalize the previous results by replacing weak exactness 
with quasi-exactness, i.e. amalgamable colimits with weakly amalgamable co- 
cones, and thereby dropping the uniqueness requirement. Hence, several non- 
isomorphic “universal” institutions may coexist peacefully with our approach, 
and also non-exact institutions and comorphisms may be included in the indexed 
coinstitution serving as basis for heterogeneous specification. 

We first extend Def. 25]to indexed coinstitutions: 


Definition 33. An indexed coinstitution T: Ind? —+ ColIns is called locally 
quasi-exact, if each institution I’ is quasi-exact (i € |Ind|). It is called quasi- 
exact, if for each diagram D: J — Ind, there is some cone (l, (dj) je,7)) over 
D whose image under T is weakly amalgamable. Quasi-semi-exactness is the 
restriction of these notions to diagrams of shape èe <——_e——>e. 














However, for the index level, even quasi-exactness may be too strong. Con- 
sider the diagram 


CASL 


a 


MODALCASL COCASL 


How do we obtain a weakly amalgamable cocone? A simple way is to use 
the embedding of MODALCASL into CASL and compose it with the inclusion of 
CASL into COCASL: 


ê This problem can possibly be circumvented by formally adjoining limits to the index 
category, which are then interpreted using Grothendieck institutions over subdia- 
grams. However, this would add considerable complexity to the construction. 

T This problem already has been noted by Diaconescu [5] for his more special version 
of Theorem [32] see why we consider it to be more special. 
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CASL 


ee 


MODALCASL CoCASL 


er 


CoCASsL 


but the resulting square does not even commute] The reason is that on the way 
from CASL to COCASL via MODALCASL, MODALCASL adds an implicit set of 
worlds, which is made explicit by the embedding of MODALCASL into Casi f] To 
obtain a commuting square, we would need to have a comorphism from COCASL 
to itself which adds an explicit set of worlds. However, this solution is rather 
inelegant, since it means that any (present of future) extension of CASL without 
possible world semantics (e.g. for HASCASL), we need a similar comorphism. 


We hence prefer to split the square into two lax triangles: 


CASL 


Pa 


MODALCASL CoCASsL 


NS 


CoCASL 








and indeed, the square is weakly amalgamable in the following sense: 


Definition 34. Given a 2-indexed coinstitution T: Ind* —+ ColIns, a square 
consisting of two lax triangles of index morphisms 


8 Of course, we could also embed everything into HOL, which would not cause any 
relevant change to the subsequent discussion. 

° See [I5] for the reason why the set of worlds cannot be omitted even for models of 
signatures without modalities. 
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is called (weakly) amalgamable, if the following diagram is a (weak) pullback 





Mod'(5) < = Mod?(6"()) 
Bt [ae 
pg Mod" (6#(5:)) ne Moa (61 (64(55))) 
[Moat ay : 
Mod’? (82(5)) Ë Mod" (2(6())) < ee 


where the lower right square is a pullback. That is, each pair consisting of a 
PL(X)- and a PY(X)-model with the same X-reduct is (weakly) amalgamable 
to a pair consisting of a BP? (6 (S))- and a P(B ())-model having the same 
4(3))-reduct. 


d d2 o 
T is called lax-quasi-exact, if each for pair of arrows j1 ——> i <—— j2 in 


Ind, there is some square 
i 


jl = | <= J2 
k 


consisting of a weakly amalgamable square of lax triangles, such that additionally 
T is quasi-semi-exact. 














Note that this property is different from (and indeed, incomparable to) amal- 
gamability of the individual lax triangles: 


Definition 35. Given a 2-indexed coinstitution T: Ind* —>+ColIns, a lax trian- 
gle of index morphisms 
i 


-a 
L 


is called (weakly) amalgamable, if T maps it to a (weakly) amalgamable laz tri- 
angle in the sense of Definition 
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Theorem 36. For a 2-indexed coinstitution T: Ind* —>ColIns, assume that 


— T is lax-quasi-exact, and 
— all institution comorphisms in T are weakly exact. 











Then T* /= is quasi-semi-exact. 





Call a diagram acyclic (connected) if the graph underlying its index category 
is acyclic (connected) when the identity arrows are deleted. 


Corollary 37. Let Z satisfy the assumptions of Theorem[36| Then TË /= admits 
weak amalgamation of finite acyclic connected diagrams. 














As stated above, the importance of these results lies in the fact that quasi- 
(semi-)exactness is a prerequisite for the (soundness and completeness of the) 
proof calculus of development graphs |15/16}. Due to lack of space, we cannot go 
into the details here. Instead, we provide a simple application of a typical situa- 
tion of a view (or a refinement) involving hiding, illustrating a simple application 
of the rule Theorem-Hide-Shift from the calculus of [516]. 

Proposition 38. In an institution, let a span of theories 
(21,14) (%2, W2) 
be given. Then the refinement statement 
Mod(o;)~!(Mod(a2)(\Mod(22, Y)|)) C [Mod (5, Y1 )| 


follows from (and, hence can be reduced to) the statement 


Mod(53, 62(W2)) C Mod(53, 61(W)) 


2 
e 
a Xə 
s a 
23 


is a weakly amalgamable square. 


provided that 
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7 From Specifications to Programs 


Consider a specification SortSpec of sorting written in CASL (let it have signature 
Xs), and a sorting program SortProg written in PLNG (let it have signature 
Xp). We can use the institution semi-morphism toCASL: PLNG — Cast from 
example[8]to express that SortProg is an implementation of SortSpec. Let (®, 3) 
be toCASL decomposed in its signature and model translation component. Then 
the property that we need to express is 


Bsp (Mod *™" (SortProg)) C Mod@“5 (SortSpec) 


assuming that P(X p) = Xs (if needed, we can ensure this property by massaging 
the CASL specification appropriately). 

Now the question arises how to prove this property. It would be easy if 
toCASL could be extended to an institution morphism; however, there is no 
hope to translate CASL formulas into programs. However, we can split the semi- 
morphism toCASL = (®, 3) into a span of comorphisms 





toC ASL~ toC ASLt 
PLNG < Cas o Ci 
as follows: 
$ id : p ; 
Sign? LNG 4 Sign? LNG Sign 454 
Mod?LNG B ModC4SE o por id Mod©45£ o GP 





Here, the “middle” institution CASLo®@ is the institution with signature category 
inherited from PLNG, no sentences, and models inherited from CASL via ®. 
Our refinement statement can now be reformulated in terms of comorphisms: 


(B¥RCASE* )-1( BOASI (Mod? “NG (SortProg))) © Mod@ 4°" ( SortSpec) 


We can regard this in a suitable Grothendieck institution; then it has ex- 
actly the form of the statement in Prop. [B8] We hence can reformulate the 
statement, provided that we have quasi-semi-exactness. By Theorem B6] we 
need lax-quasi-exactness of the indexed coinstitution. The essential ingredi- 
ent to find a square of two weakly amalgamable lax triangles for the span 


- + 
PLNG ee CASL o ® =e CasL. But this can e.g. be given 


by coding of both CASL and PLNG into a common logic such as higher order 
logic (indexing institutions and comorphisms by themselves): 


HOL 
cai ai ee 
id 6 
PLNG => — _  CASL 


toCASL~ us 


CASLO@® 
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By Theorem[36] this lead to a weakly amalgamable square in the Grothendieck 
institution: 


(CASL o $, Xp) 


a 
) 


(PLNG, Xp CASL, Xs) 


(PLNG2HOL,i es 


toC ASL* id) 


( 
(HOL, PLNG2HOL(Sp) 


By Prop. [38] our refinement statement can now be reformulated as follows: 


Mod”?! (PLNG2HOL(SortProg)) C Mod”? (6(CASL2HOL(SortSpec))) 
which is amount to proving, in HOL, 
PLNG2HOL(SortProg) + 0(;CASL2HOL(SortSpec)). 


An implementation of this machinery for the case PLNG=Haskell is under 
way, to become part of the Heterogeneous Tool Set Hers [15/17]. 
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A Proofs of the Theorems 


Proof of Prop. [16] Given comorphisms (@;, p;): I — J (i = 1,2,3) and a span 
of modifications 


(P1, p1) 
oN 


(S2, p2) (3, p3) 
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construct the signature component P(X) of the resulting comorphism as the 
pushout 


©, (2) 

(71) 3 N 
B(X) D(X) 
(aa may g (Oi) = 
P(x’) 


By the universal property of the pushout, this extends to a functor ®: Sign’ — 
Sign’ such that 81: P —>@ and 09: Po —> P become natural transformations. 


ZS 


Jod, Se, Joĝ 


N e 


We can then define room component of the pushout comorphism p: I — Jo ® to 
be J -82 0 p2 = J - 01 o p3, and the cocone consisting of 0): (3, p3) => (®, p) and 
02: (B2, p2) = (®, p) is easily seen to satisfy the universal property of a pushout. 

The proof for coproducts, coequalizers or arbitrary non-empty colimits of 
connected diagrams is very similar. 














Proof of Prop. [7] The initial institution morphism (®, u): I — J is defined 
by letting P(X) be the initial signature, and us consist of the empty map of 
sentences and the unique functor into the terminal model category. 














Proof of Prop. PO} By the definition of comorphism modification, (ZÍ - T”) o 
p? = pt. But this just means that equivalent signature morphisms induce the 


same corridors. 














Proof of Prop. It is easy to see that the above relation is contain in 
the relation generated by (I): just apply (I) twice. It remains to show that the 
above relation is a congruence. Reflexivity and symmetry are clear. Concerning 
transitivity, assume that 


(di, 01 OLS") = (dz, 01 0 L$?) = (dg, 02 ° TS) = (ds, 02 0T), 
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the first relation being witnessed by uz : d2 => di, u2 : d2 = ds, and the second 
by by u3 : d4 = d3, u4 : d4 = ds. Take the pullback in Ind(j, i) of the two spans 


dı d3 ds 
də d4 
* 1 


u` a ai 
d 


By the construction of pushouts of 2-cells in CoIns (see Prop[I6), the middle 
square in 


P(X) p(X) b4s(5)) 
2(5) 
Ty 
o"(2) 





y 
yy 
is a pushout, and the mediating morphism ø leads to the desired form 
(dy, 01 o T4) = (dy,0 oT!) = (ds, 0 o T°) = (ds, o2 o TY). 
Concerning composition, assume that 
(di,0 o Ty") = (d2,0 o T$") 
via uy: d => dı, u2 : d > d2, and 
(e1, T oTr) = (e2, TOTS) 


via vı : € > €1,V2 : e€ > eg. Then for k = 1,2, 


(ek, 0 oT) o (dk, T oT) 
= (dp o ek, 0 o TS o P(T) o P(T )) (def. Grothendieck composition) 
= (dk 0 ek, o 0 O° (T) o Pêr (TS) o Tpi (xn) (naturality of Z“*) 
= (dk © ek, o o P (T) o T”) (functoriality of Z) 








which shows that we arrive at the desired form. 
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Proof of Thm. 26} Apply Theorem 1 of [22] with C; = Sign’ and Cm = 8”. 
Note that Sign” is then Flat(C°?)”. 








Proof of Lemma 28} Easy induction over the definition of (d1, 01) = (d2, 02). 




















Proof of Prop. 29} Analogous to the proof of Prop. 22] 


Proof of Thm. The proof idea follows that of Theorem 1 in [22], the 
necessary modifications being caused by the congruences. By assumption 2] we 
can always choose representatives d € Ind of congruences classes |d] € Ind/= 
in such a way that d is a canonical weakly terminal object. Similarly, we can 
always choose representatives (d,o) of congruence classes [(d,c)] in Sign” / = 
in such a way that d is the canonical weakly terminal object in its connected 
component: given an arbitrary (d,a:6¢()) —> X') in Sign*, let u:d—>t be a 
2-cell into the canonical weakly terminal object. Then (t,o o T$) is equivalent 
to (d,o). 

Given a diagram D: K —> Sign* /=, we introduce the notation (ip, Xp) for 
D(k) (k € |K|) and [(dm,om)]: (ik, Xk) — (in, Xe) for D(m) (m: k — k' € 
K). Let D: K — Ind/ = be the projection of D to the first component; by 
Lemma 28] this is a well-defined diagram in Ind/=. By assumption [] D has a 
limit ([mx]:4 — te) eel K]- 

Let the diagram G: K — Sign’ be defined by 


G(k) = P+ (Xp) (k € |K) 
G(m) = Pr (om) (m: k'—k € K) 


Note that m x is chosen to be canonical weakly terminal in [mx]. By assumption] 
G has a colimit (og: G(k) — ©)pe|K|. We show that ([(mx, o%)]: (ik, Xk) — 
(i, X))kejx] is a colimit of D. 

Since equality implies congruence, ([(mk, ok)])kejx] is a cocone of D. Let 
([(nx, Ox)]: (ik, Zr) — (i, ©")) kei x) be another cocone. By Lemma[2§] ([nx]: i! — 
ik)ke|K| is a cocone for D. Hence there is a unique [d]: i’ — i with [mx] © [d] = 
[np]. Since we choose representatives canonically in a way closed under compo- 
sition, Mk od = nr. 

By assumption J] (©“(c%))xejx) is a colimit of 4 o G. Note that the source 
of &4(o;,) is 4(G(k)) = 64(G™ (X,)) = P (Xp). By the cocone property of 
([(nk, 9) \)ke|K|> (nk, Ok) = (dm o Nng, Opi o PMR! (om)) for mzk — k' € K. By 
the assumption of weakly terminal canonical representatives, nk = dm ong. By 
assumption] 0; = 04,08" (om). This shows that (0p: P”? (Xk) — X')kejgj isa 
cocone for P4 o G. Hence, there is a unique 7: P(X) — XY” with ro O4(c,) = Og. 
Then [(d,7)]: (i, X) — (i', 5") is a unique morphism in Sign* / = such that 
((d,7)] © [(m%, 0%)] = [(r, be). 


Proof of Prop. [BI] Clearly, emb’ is injective on objects. Faithfulness follows 
from assumption [3] Preservation of colimits can be seen by inspecting the con- 
struction of the proof of Theorem [30} if the indices are all i, then the colimit is 
just that in Sign’. 
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Proof of Thm. B2} “Only if” ,[I1} Following Prop. 2 in [5], it is see to see that 
for each i € |Ind|, the model functor Mod’ is the restriction Mod” (i, .) of the 
model functor of the Grothendieck institution to the subcategory Sign! of the 
Grothendieck signature category Sign” /=. 


bt 


(Signi)? — -> (Sign* /=)°P 
Mod’ visat 
CAT 


By Prop. BI] the canonical injection emb’: Sign’ — Sign” preserves colimits, 
hence Mod’ takes pushouts to (weak) pullbacks because Mod” does so. 
“Only if”, 2} Given a pullback in Ind/= 


ix—— jl 
fr 
oe [eal y 


choose dı, d2, e1, €2 canonically. By the construction of colimits in Theorem [B0] 
for any signature X in Sign’, 


[(di ,td)] 





(i, X) (j1, 8% (2)) 


[eraan 


(k, P“ (O% (D))) = (k, 82 (8 (5))) 


[resan 


e2,id 
gz ra 2") 


is a pushout in Sign” /= and is therefore mapped to a (weak) pullback by the 
model functor. This gives exactly the desired property. 

“Only if”, B} Let d: j — i by canonical and o: 5; — > a signature mor- 
phism in Sign’. By the construction of colimits in Theorem [30] 


; [(id,o)] ; 
(i, X1) (i, X2) 
|i | 
l [(id, 64 (c))] 
(j, &7(21)) ( ,BI(52)) 


is a pushout in Sign” /= and is therefore mapped to a (weak) pullback by the 
model functor. Again, this gives exactly the desired property. 
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“Tf”: Consider an arbitrary pushout in Sign” /= 


[(d1,01)] 


(i, Xo) (ji, X1) 
|Karen [iera 
f [(e2,82)] 
(j2, Xə) eet (k, 3") 


and assume that representatives are chosen canonically. By the construction of 
colimits in Theorem [30] the above pushout can be expressed as the following 
composition of four pushout squares: 


[(di,2d)] [(td,o1 )] 














(i, Xo) (j1, P” (50)) (j1, X1) 
| so |e [Kova 
(ja, DE (p)) EH (k, 4 (G4 (Ep))) = (k, (8% (39) rk, p= (51) 
[sen [ias a2)] (aoi 
(2, 52) Kond O k B°4(Zig)) Seg, 3) 


Now the model functor of the quotient Grothendieck institution maps the upper 
left pushout to a (weak) pullback because the 2-indexed coinstitution is (weakly) 
semi-exact, maps the lower right pushout to a (weak) pullback because the 2- 
indexed coinstitution is (weakly) locally semi-exact, and maps the remaining 
two squares to (weak) pullbacks because the comorphisms for canonical index 
morphisms are (weakly) exact. Since (weak) pullback squares compose, the result 
follows. 














Proof of Thm. 


di,o È d2,0 ; , $ 
Let a diagram (j1, X1) m (i, X) ata (j2, X2) in Sign” 
be given. Let 
d2 dl 
j 5| & j 
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be a weakly amalgamable square of two lax triangles with Z* quasi-semi- 
exact. By the latter property, there are 01, 02 such that 





oys) 2, get(@t'(yy) p(s) 
Dee 
P2 (P25) 61 
jer 
PE Dg) ere EE eeri 


is a weakly amalgamable square, which leads to weak amalgamability of the 
lower right square in 








dl,id id,o 
(i,) = > (j1,04(5)) “2 (71, 21) 
(el,id) (el,id) 
id, T% id, B®! (0 
(a2,id) (k, D) “2 k, ot (oa \ fe Px, o1(5,)) 
(id, TY? 
y , 
j2, p12 DJ ee) k, p°? d2 xy : (id,O1) 
J H 
(id,o2) | ate) 
Y e2,id id,0 v 
(j2, X2) ee (k, ®°2(5)) D o (k, 5”) 


The upper right and lower left squares are weakly amalgamable by weak 
exactness of J and Z®. The pair of the remaining two squares is jointly weakly 
amalgamable since it is induced by a weakly amalgamable square of two lax 
triangles (and note that squares in Sign*/= induced by lax triangles in Ind 
commute by definition of =). Since weakly amalgamable squares can be pasted 
together, we get a weakly amalgamable cocone for the original diagram. 














Proof of Corollary [37] In the sequel, we will use terms like “connected”, 
“maximal”, “lower bound” for small categories, when we really mean the pre- 
order obtained from the category by collapsing the hom-sets into singletons. A 
maximal element in a pre-order is an element which is equivalent to any element 
above it. 

Let D: J — Sign” be a connected diagram and let Maz be the set of 
maximal nodes in J. We successively construct new diagrams out of J. Take two 
nodes in Maz that have a common lower bound (if two such nodes do not exist, 
the diagram is not connected). By Theorem 6] there is a weak amalgamating 
cocone for the sub-diagram consisting of the two maximal nodes and the lower 
bound (together with the arrows from the lower bound into the maximal nodes). 
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Extend the diagram with the cocone. The diagram thus obtained now has a set 
of maximal nodes whose size is decreased by one. By iterating this construction, 
we get a diagram with one maximal node. The maximal node then is just the 
tip of a weakly amalgamating cocone for the original diagram. 


Proof of Prop. B8} 
A model Mı € |Mod(o1)~!(Mod(c2(Mod(, W2))))| is nothing but a pair 
(Mı, M2) of models Mı € |Mod(2})|, M2 € |Mod(X2, Y2)| with common reduct 
to X. This pair can be amalgamated to a model M3 € |Mod(23)|. Since M39, = 
Mb, by the satisfaction condition, M3 =s, 02(¥2). By the assumption, also 
M.: F- 3 01(%). But this means M, = Ms3]\o, MXi Wy. 
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Some Varieties of Equational Logic 
(Extended Abstract)* 


Gordon Plotkin 


LFCS, School of Informatics, University of Edinburgh, UK. 


The application of ideas from universal algebra to computer science has long been 
a major theme of Joseph Goguen’s research, perhaps even the major theme. One 
strand of this work concerns algebraic datatypes. Recently there has been some 
interest in what one may call algebraic computation types. As we will show, 
these are also given by equational theories, if one only understands the notion 
of equational logic in somewhat broader senses than usual. 

One moral of our work is that, suitably considered, equational logic is not 
tied to the usual first-order syntax of terms and equations. Standard equational 
logic has proved a useful tool in several branches of computer science, see, for 
example, the RTA conference series [9] and textbooks, such as [I]. Perhaps the 
possibilities for richer varieties of equational logic discussed here will lead to 
further applications. 

We begin with an explanation of computation types. Starting around 1989, 
Eugenio Moggi introduced the idea of monadic notions of computation [IIJ12] 
with the idea that, for appropriately chosen monads T on, e.g., Set, the category 
of sets, one thinks of T(X) as the type of computations of an element of X. For 
example, for side-effects one takes the monad Ts5(X) =aer (S x X)° where S is 
the set of states. Below, we take S =,., V'°° where V is a countably infinite 
set of values such as the natural numbers, and Loc is a finite set of locations. 
See [2] for a recent exposition of Moggi’s ideas, particularly emphasising the 
connections with functional programming, where the monadic approach has been 
very influential. 

As is well known, equational theories give rise to free algebra monads. For 
example the free semilattice monad arises from the theory of a binary operation 
U subject to the axioms of associativity, commutativity and idempotence, where 
the last is the equation x Ua = x. The induced monad Ty (X) is the collection 
of all non-empty finite subsets of X. In general, the equational theories with 
operations of finite arity induce exactly those monads which have finite rank, 
see, e.g., [19]. 

In denotational semantics one typically employs a category of ordered struc- 
tures, such as w-Cpo, the category of w-cpos, which are partial orders with lubs 
of increasing w-chains, and with morphisms those monotonic functions preserving 
the w-lubs. An w-Cpo-semilattice is a semilattice in w-Cpo, that is an w-cpo 
together with a continuous binary function satisfying the semilattice axioms; 


* This work has been done with the support of EPSRC grant GR/S86372/01 and a 
Royal Society- Wolfson research merit award. 
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the free w-Cpo-semilattice monad is (a generalisation of) the convex powerdo- 
main monad, originally defined only on a subcategory [5]. There are also lower, 
or Hoare, and upper, or Smyth, powerdomain monads; these are obtained by 
adding an additional axiom, viz: 


truy 
for the lower powerdomain, and: 
z>auUuy 


for the upper one. Note that these are inequations rather than equations. 

This idea was carried further in [15] where similar characterisations were 
noted for other important monads arising in Moggi’s approach, such as those 
for exceptions, state, input/output, probabilistic nondeterminism and nontermi- 
nation. One of the main contributions there was an axiomatisation of the state 
monad employing families of operations of finite or countably infinite arity, as 
follows. For each location l one assumes given an operation symbol: 


lookup, 


of arity the countably infinite set V (it is convenient to allow any set to be an 
arity, not just a cardinal) and for each each location l and value v one assumes 
given a unary operation symbol: 


update, ,, 


The idea is that a term of the form lookup,(...t,...) denotes the computation 
which looks up the contents of l in the current state and, if this is v, then proceeds 
according to the computation denoted by the v-th argument, ty. Similarly a term 
of the form update, „(t) denotes the computation which first updates the contents 
of the location | to v and then proceeds according to the computation denoted 
by t. 

These ideas have been elaborated into what may be termed the algebraic the- 
ory of notions of computation, where the operations and equations are primary 
and determine the monads. The computational importance of the operations is 
that it is they that give rise to the effects at hand [I6]. Applications include 
the operational semantics of effects [I4], their modular combination [7] and, 
prospectively, a general logic of effects [I7]; see [18] for a survey. 

The examples demonstrate that the algebraic theory of computation would 
benefit from a wider means of expression than is provided by standard equational 
theories: one also needs to consider parameterization, operations of countable, 
i.e., denumerable, arity and inequations. As we will see, a unifying rôle is played 
by Lawvere theories: each such kind of ‘equational’ theory corresponds to a 
kind of Lawvere theory, possibly enriched or countable rather than finitary, as 
standard. 
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Parameterization This occurs naturally in mathematics, for example in 
the notion of a vector space over a given field F. There one has the axiom: 


Ma +y) = Aw + ày 


which involves both field elements and vectors. To treat the notion as an equa- 
tional theory in the standard sense, one would introduce a unary operation of 
‘multiplication by A’ for each field element A and the axiom would be rendered 
as a family of equations, with one for each field element. We will instead treat 
the axiom as a single parametric equation, with a variable ranging over the 
field and with multiplication by a field element treated as a parametric unary 
operation on the vector space. 

One can go further and allow ‘side-conditions,’ involving only the parameter 
variables. For example, in the case of state, treating update as a unary operation 
parametric over locations and values, one has the following parametric equation: 


update, ,,(update;, „(£)) = update, ,, (update, ,(a)) (if 1 #1) 


which has the side condition that | 4 I’; the equation states that the order in 
which one updates distinct locations does not matter. 

Such parametric equational theories abbreviate ordinary equational theories, 
but, by allowing a schema to be replaced by a parametric equation with side 
conditions, may enable finitary axiomatisation and consequent direct computer 
implementation. Formally one assumes given an interpretation 2( of a many- 
sorted first-order signature, the parameter signature; for the equational part one 
further assumes given a parametric signature where the operation symbols are 
assigned a given list of sorts from the parameter signature as well as the usual 
natural number. There is then a natural notion of parametric term where the 
parameters are given by standard first-order terms over the parameter signature 
and so of parametric equation: 


t=u (p) 


with side condition y written in first-order logic with equality over the param- 
eter signature. A collection of such equations abbreviate, as indicated above, a 
standard equational theory over a derived signature. 

There is a natural system for deriving these parametric equations from a given 
collection Th of first-order formulas with equality over the parameter signature, 
together with another given collection Eqn of parametric equations; the system 
includes first-order logic with equality for the parameter spaces and equational 
logic for the parametric equations. One can define whether a parametric equation 
is a semantic consequence of Th and Eqn relative to the fixed interpretation Y, 
but, unfortunately, taking Th to be the theory of 21, completeness need not hold. 
It may, however, hold in particular cases: one such is that of vector spaces men- 
tioned above taking the standard ‘ring signature’ for the many-sorted first-order 
signature. On the other hand, fixing Th and Eqn, one can show completeness, if 
by validity one means with respect to all models of Th. 
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Infinitary operations One can treat operations of countable arity using the 
evident natural notions of countable equational theory and countable Lawvere 
theory; the induced monads are those of countable rank. Here is an example of a 
schema of infinitary equations involving the operation of looking up the contents 
of a location: 

lookup,(... update, ,(z)...) = x 


The equation states that if a location is looked up and then updated with the 
value found, then that is equivalent to doing nothing. 

However it would again be preferable to have a finitary syntax, now for op- 
erations of countably infinite arity. To that end, we employ binding on variables 
of the arity sort, here val (standing for V); the term-forming construction for 
lookup is then: 

lookup, (v: val.t) 


where a is a parameter term of sort loc (standing for Loc) and t is a parametric 
term given the environment v: val. With this, the above infinitary schema can 
be written as the following finitary ‘equation’: 


lookup; (v: val.update, ,,(«)) = x 
We consider next the following infinitary equation scheme: 
lookup;(...updatey w (£v). .-) = updatey ,,(lookup;(...2y...)) GEL 41’) 


which states that the operations of looking up one location and updating another 
commute. Notice that it employs a family x, of variables. If we introduce the 
notion of a parametric variable (ranging over a suitable collection of functions) 
this infinitary equation scheme can also be rendered in a finitary fashion: 


lookup; (v: val.updatey ,,(a,)) = update; „ (lookup; (vu: val.z,)) (if 1 41’) 


These two ideas of binding and parametric variables suffice to write down all 
the parameterized, possibly infinitary, equation schemes for global state given 
in [15] finitarily. 

In the general formalism, we again begin with an interpretation 2 of a pa- 
rameter signature, as above, except that we assume also given a subcollection of 
the sorts, called the arity sorts. In the parametric signature an operation symbol 
has m parameter arguments of given parameter sorts, and n argument positions, 
with the ith being abstracted on k; arity sorts. A collection of parametric equa- 
tions abbreviates a countable equational theory, provided that the arity sorts 
are interpreted by countable sets. 

One can then give a logic following the previous lines. An immediate question 
is whether the logic is complete for global state, where for the many-sorted first- 
order signature one would take the two sorts, loc and val, and constants for all 
the elements of Loc, with the evident interpretation using V and Loc. We would 
also like to know whether we have completeness relative to all interpretations 
of a given theory, as we do in the simpler case, considered above, of finitary 
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operations. Positive answers to such questions would demonstrate that valid 
uniform infinitary equations have uniform proofs. 


Inequations These are a natural generalisation of equations and there is 
an evident notion of inequational, or ordered, equational logic over operations 
of finite arity, which has a straightforward completeness theorem using posets 
rather than sets [3]. The resulting ordered equational theories correspond to or- 
dered Lawvere theories, in the sense of [23]3]. These are not the same as the 
Pos-enriched Lawvere theories of [19], as the latter allow all finite posets as ar- 
ities of operations, not just the discrete ones. However they are the same as the 
Pos-enriched Lawvere theories of [10], equivalently the Pos-enriched discrete 
Lawvere theories of [20]. There is a natural generalisation to countable inequa- 
tional logic, and the inequational theories of this logic correspond to the discrete 
countable Pos-theories (the countable case is the main one considered in [20]). 
In general discrete V-theories of a given rank freely induce V-theories of that 
rank, in the sense of [19], and the latter induce the V-monads of the same rank; 
not all such monads arise from discrete theories. 


Parameterization, now over given posets, is again an expressive convenience, 
and there are inequational versions of the two equational deductive systems 
considered above: one for parametric inequations and the other with finitary 
syntax for infinitary operations. For the parameter interpretation 2 it is natural 
to work with enriched first-order structures, which we take to mean here that 
sorts are interpreted by posets, operations by monotonic functions and relations 
by subsets; one then naturally works with first-order logic with inequations a < b, 
rather than equations, to express parameter conditions. One evidently requires 
arity sorts to be interpreted by countable discrete partial orders to obtain discrete 
countable Pos-theories from a collection of parametric inequations. 


Turning to w-Cpo-enrichment, one can consider discrete finitary or countable 
w-Cpo-theories. Here parameterization is more than an expressive convenience: 
it enables one to implicitly write down equations involving sups of increasing 
chains. One can still work with simple inequations, but rather than finitary or 
countably infinitary operation symbols, one takes families of such, parameterized 
over a collection of parameter w-cpos. They are to be interpreted by functions 
which are continuous over the parameter w-cpos as well as the algebra w-cpo. A 
natural example is provided by d-cones, which arise when considering powerdo- 
mains for mixed ordinary and probabilistic nondeterminism [22]. These are the 
w-Cpo-semimodules over the semiring R,, which latter is the w-cpo of the non- 
negative reals extended with a point at infinity, and endowed with the natural 
semiring structure [13]. 


Collections of such inequations induce the discrete finitary or countable 
w-Cpo-theories, according to the arities of the operation symbols allowed. How- 
ever there is a question as to what is the appropriate inequational logic. It may be 
best to introduce an explicit infinitary syntax for sups of increasing w-sequences, 
but then sup-terms would only be well-formed if one could prove the sequence 
was increasing, and that would mean a mutual recursion between the definitions 
of proofs and well-formed terms. It remains to investigate such a system. 
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The next question is to what extent one can achieve a useful finitary sys- 
tem. One can clearly investigate analogues of the methods used above to handle 
parameterization and operations of countably infinite arity. But it is far from 
clear what to do about the sup-terms. Perhaps one can restrict to considering 
only least fixed-points and work with a combination of the above ideas and the 
p-calculus, for which, and associated logical and categorical results, see [4]8]21]. 

Whatever the difficulties are with finding the right logic, it is at least the 
case that the combination of parameterization, binding constructions and in- 
equations, interpreted over w-Cpo, is enough to express all the theories of com- 
putation types so far considered over that category. We should admit, however, 
that this is not quite enough to account for all the computation types so far 
considered. One difficult case is that of the continuations monad. However one 
can argue that there the types should not be treated as algebraic since the nat- 
ural operations are not even of the right type to be algebraic operations, and, 
further, the monad does not have a rank [6]. 

A more interesting case is that of local state, as opposed to the above global 
state, where one can declare new locations. This was treated using a monad over 
a presheaf category in [15]. The monad was specified by equations, but they 
involved a mixture of linear and ordinary operations, with the linear structure 
coming from the Day tensor on the presheaf category. This example feels as if 
it should be treatable within an algebraic framework, but we do not see the 
proper notions of Lawvere theory or equational theory. Finally there is also the 
possibility of employing other semantic categories in place of w-Cpo for the 
algebraic computational types; we content ourselves here with the remark that 
for reasonable such categories, one would expect the relevant free algebras still 
to exist. 
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for Satisfaction as Injectivity 
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Abstract. Birkhoff (quasi-)variety categorical axiomatizability results 
have fascinated many scientists by their elegance, simplicity and gener- 
ality. The key factor leading to their generality is that equations, con- 
ditional or not, can be regarded as special morphisms or arrows in a 
special category, where their satisfaction becomes injectivity, a simple 
and abstract categorical concept. A natural and challenging next step is 
to investigate complete deduction within the same general and elegant 
framework. We present a categorical deduction system for equations as 
arrows and show that, under appropriate finiteness requirements, it is 
complete for satisfaction as injectivity. A straightforward instantiation 
of our results yields complete deduction for several equational logics, in 
which conditional equations can be derived as well at no additional cost, 
as opposed to the typical method using the theorems of constants and 
of deduction. At our knowledge, this is a new result in equational logics. 


1 Introduction 


Equational logic is an important paradigm in computer science. It admits com- 
plete deduction and is efficiently mechanizable by rewriting: CafeOBJ [I5], 
Maude [12] and Elan [9] are equational specification and verification systems 
in the OBJ family that can perform millions and tens of millions of rewrites 
per second on standard PC platforms. It is expressive: Bergstra and Tucker 
showed that any computable data type can be characterized by means of a finite 
equational specification, and Goguen and Malcolm [I7], Wand [41], Broy, Wirs- 
ing and Pepper [II], and many others showed that equational logic is essentially 
strong enough to easily describe virtually all traditional programming language 
features. It has simple semantic models: its models are algebras, straightforward 
and intuitive structures. We suggest Goguen and Malcolm [19] and Padawitz 
and Wirsing [31] as good references for many-sorted equational logic, its com- 
pleteness, as well as applications to computer science. 

There are many variants and generalizations of equational logics, ranging 
from unsorted [7] to many-sorted [19J31]|, to partial [32], to order-sorted [20/40], 
to membership [27[10}, to local [I3], to hidden [18[34] equational logics, and 
so on. A major challenge is to develop a uniform common framework for all 
these variants, that allows one to formulate and prove at least some of their 
important properties, such as Birkhoff axiomatizability, complete deduction and 
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Craig interpolation. Whether this is possible or not is open, but what is certain 
is the existence of elegant categorical equational variants by Banaschewski and 
Herrlich [4], Andréka, Németi and Sain [2]3J30], Adámek and Rosicky and 
many others, in which equations are viewed as epimorphisms and their satisfac- 
tion as injectivity, and that these allow very general treatments of variety and 
quasi-variety results. We also adopt this categorical view in the present paper. 

To emphasize the simplicity and generality of this approach, we mention that 
everything happens within only one category, denoted by C in this paper, which 
has a factorization system (£, M}. The objects of C are viewed as models and 
the morphisms in €, which for simplicity will be called equations, are viewed as 
sentenced], In order to define our sound (w.r.t. injectivity) four rule inference 
system for arrows in €, C is required to additionally have pushouts and enough 
E-projectives. To show it complete, C also needs to have directed colimits and to 
be €-co-well-powered, and some appropriate notions of finiteness for arrows in 
E need to be introduced. A related variant by Diaconescu [14], called category- 
based equational logic, considers equations as pairs of arrows, one for each term, 
and then gives a set of deduction rules that resembles that of equational logics. 

The present paper is part of our efforts to develop a unifying, categorical 
framework for axiomatizability, deduction and interpolation for equational and 
coequational logics. In [37] it is shown that the difference w.r.t. injectivity be- 
tween epimorphisms of free/projective sources and epimorphisms of any sources 
is exactly as the difference w.r.t. usual satisfaction between unconditional and 
conditional equations, that is, the first define varieties while the second define 
quasi-varieties. In [33]36], equational axiomatizability for hidden equational logic 
and coalgebra is investigated, and in [88] a categorical generalization of equa- 
tional interpolation is given. The closest to the present paper is [35], where we 
also present a complete four rule inference system for equations as epics, but lim- 
ited to unconditional axioms. In the present paper, due to crucial developments 
of finiteness concepts and results, especially Proposition [B] we non-trivially ex- 
tend the results in [85] by eliminating the admittedly frustrating limitation to 
unconditional axioms, putting thus an end to our quest for complete deduction 
when satisfaction is injectivity. We show that a four rule inference system for 
epics is complete provided that all the axioms have finite conditions and the 
equation to be derived is finite. An interesting characteristic of our deduction 
system is that it is also complete for conditional equations, and that those can 
be derived the same way as the unconditional ones. We are not aware of any 
similar result for any equational paradigm in the literature until [35], where a 
version of it, restricted to unconditional axioms, was presented. 

Section P] recalls some categorical concepts and introduces our notational 
conventions. Section B] revises factorization systems. Section [4]shows how equa- 
tions, both unconditional and conditional, are equivalent to surjective morphisms 
and their satisfaction to injectivity; clarifying examples are presented. Section 


1 If one thinks that equations should be regular epimorphisms then one can read so 
instead of “epimorphism.” Our results hold for any epimorphisms, so a restriction 
to regular epimorphisms would be technically artificial and less general. 
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[5] introduces our four rule inference system for arrows and shows how it works 
on various examples. Finiteness concepts and results are explored in Section [6] 
which are necessary in Section[/]to show the completeness result. The last section 
concludes the paper and presents challenges for further research. 


2 Preliminaries 


The reader is assumed familiar with basic concepts of category theory 
and equational logics [7J8J31J19]. In this section we introduce our notations and 
conventions, and recall some less frequent notions. Given a category C, let |C| 
denote its class of objects; we use diagrammatic order for composition of mor- 
phisms, i.e., if f: A — B and g: B—C then f;g: A — C. If the source or the 
target of a morphism is not important in a certain context, then we replace it 
by a bullet to avoid inventing new letters; for example, f: A — e. In situations 
where there are more bullet objects, they may be different. If f: A — B and 
g: A— C have a pushout then we let f9: C — e and gf: B — e denote the 
opposite arrows, up to isomorphism, of f and g in that pushout. 

Given a class of morphisms £ in a category C, P € |C] is called €-projective 
iff for any e: e — X in E and any h: P— X, there is a g s.t. g;e = h. C has 
enough €-projectives iff for each object X € |C| there is some E-projective 
object Px and a morphism ex: Px — X in £. It is known that any set is E£- 
projective where € consists of all the surjective functions, that free algebras are 
E-projective where € is the class of surjective morphisms, and that the category 
of algebras has enough €-projectives (for an algebra X, one can take Px to be the 
free algebra over the elements in X seen as variables). Dually, I is €-injective 
iff for any e: X — è and any h: X —T, there is a g s.t. e;g = h. C is called 
E-co-well-powered iff for any X € |C| and any class D of morphisms in € of 
source X, there is a set D' C D such that each morphism in D is isomorphic to 
some morphism in D’; we often call D’ a representative set of D. 

If X is an object in a category E, then X | Eis the comma category containing 
morphisms e,e’,...: X — e in E as objects and morphisms h € E such that 
e;h = e' as morphisms. Notice that if E contains only epimorphisms then there 
is at most one morphism between any two objects in X | E. The intuition in 
our framework for the the objects e, e’,...: X — e in the comma category X | E€ 
will be that of equations over the same source (variables, condition). 


3 Factorization Systems 


The idea to form subobjects by factoring each morphism f as e;m, where e is 
an epic and m is a mono, seems to go back to Grothendieck [22] in 1957, and 
was intensively used by Isbell [24], Lambek [25], Mitchell [28], and many others. 
Lambek was probably the first to explicitly state a diagonal-fill-in property in 
1966 [25], called also “orthogonality” by Freyd and Kelly in [I6]. One of the 
first formal definition of a factorization system that we are aware of was given 
by Herrlich and Strecker [23] in 1973, under the name factorizable category, and 
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a comprehensive study of factorization systems, containing different equivalent 
definitions, was done by Németi in 1982. 


Definition 1. A factorization system of a category C is a pair (E,M), s.t.: 


— E and M are subcategories of epics and monics, respectively, in C, 

— all isomorphisms in C are both in E and M, and 

— each morphism f in C can be factored as e;m with e € E andm € M 
“uniquely up to isomorphism”, that is, if f = e';m’ is another factorization 
of f then there is a unique isomorphism a such that e;a = e anda;m! = m. 


The following are important properties of factorization systems: 


Proposition 1. Let (€,M) be a factorization system for C, and let e € E and 
f €C be morphisms having the same source. Then 


1. Diagonal-fill-in. If f;m = e; g then there is a “unique up to isomorphism” 
h €C such that e;h = f andh;m=g, and 
2. Pushout. If the pushout of e and f exists then ef € E. 


For the rest of the paper, suppose that (€,M) is a factorization system for 
a category C. The proof of the following proposition, which intuitively shows 
conditions under which “equations can be put together,” can be found in [35]: 


Proposition 2. If X € |C| and C has colimits then X | E has colimits. 


When C is €-co-well-powered, colimits in X | € also exist for large diagrams 
D (whose nodes form a class): one takes the colimit of a representative set of D. 


Definition 2. We let ({yi}ier,ep: X — Xp) denote the colimit of DC X |E, 
and use e1 U e2 instead of ep if D consists of only e1: X — è and e2: X — è. 


4 Equations as Epimorphisms 


As advocated by Banaschewski and Herrlich [4], by Andréka, Németi and Sain 
PIBO], and by many others including the author [37[35], equations can be re- 
garded as epimorphisms and their satisfaction as injectivity. Readers with differ- 
ent background bases can find/have different explanations or intuitions for these 
relationships. We next informally give our version which seems closest in spirit to 
the subsequent results, together with some examples inspired from group theory. 

An unconditional equation e over variables x, y,... is nothing but a binary 
relation Re (containing only one pair) on the term algebra T(z, y, ...). This rela- 
tion generates a congruence Ce, which further generates a surjective morphism 
of free source se: T(£,y,z,...) > T(2,y,..-)/c,. An algebra satisfies e iff it is 
{se }-injective. Conversely, the kernel K, of a surjective morphism of free source 
s: T(x, y,...) > e is nothing but a set of equations quantified by x, y,..., and an 
algebra is {s}-injective iff satisfies Ks. It is often more convenient to work with 
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sets of equations rather than with individual equations, as perhaps best illus- 
trated by Craig interpolation results that do not hold for individual equations 
but do hold for sets of equations . In this paper, by equation we also mean 
a set of individual equations over the same variables, so there is a one-to-one 
correspondence between equations and epimorphisms of free sources. 


Example 1. Let X be the unsorted signature consisting of a constant 1, a unary 
operation (_) and a binary operation __, and let us consider the equations 
(Ve) a1 = a, (Vx) zT = 1, and (Vaz, y, z) x(yz) = (xy)z. In our notation, these 


equations correspond to the following three epimorphisms: 


axiom, : T(x) > e generated by (a1, x), 
axiom: Ts(2) > e generated by (2%, 1), 
axiom3: T(x, y,z) — è generated by (x(yz), (xy)z), 


where T'y(z) and T'y(a, y, z) are the X-term algebras over the variable x and over 
the variables x, y, z, respectively, and an epimorphism e: Ty (x, y,...) — è is gen- 
erated by a binary relation R of terms iff e is the natural surjection T(z, y, ...) > 
Ty (a, y,..-)/z that maps each term to its congruence class. Notice that we could 
have also merged the first two epics into the epic axiom, U axiomg: Ty(a) > 
Ts,(x)/¢(x1,2),(ez,1)}- It is known that the algebras satisfying the three equations 
above are exactly the groups, i.e., the left unit and left inverse equations can be 
proved from the above. We will focus on these proofs in the next section. 


What is less known is that conditional equations can also be viewed as epics 
and their satisfaction as injectivity. This is explained in detail in . Intuitively, 
one first factors the term algebra by the condition and then takes the epic gen- 
erated by the equivalence classes of the conclusion. 


Example 2. The conditional equation (Vz) x = 1 if xx = 1 on groups (see Ex- 
ample[I), is in our notation equivalent to the epic 


axiom4: T’s(x)/(22,1) > ° generated by (x, 1), 


where, for simplicity, we have identified equivalence classes with some represen- 
tatives: (x, 1) should normally be (ĉ,1). A group satisfies this new axiom iff it 
has no proper square roots of unity iff it is {axiom,}-injective. 


In theoretical efforts, it is often technically more easily to abstract freeness by 
projectivity. We have shown in that there is essentially no difference between 
projective and free sources of epimorphisms with respect to axiomatizability, and 
that free objects are usually projective in almost any category. The results in 
this paper also hold for both situations, but we only discuss projective sources. 

For the rest of the paper we assume that C, besides its factorization system 
(E£, M}, also has enough €-projectives. Moreover, for each object X € |C| we fix 
an arbitrary €-projective object Px and an arbitrary morphism ex: Px > X 
in €. If C is the category of algebras over some signature and X is the quotient 
of a free algebra by some congruence, then Px is usually taken to be the free 
algebra and ex to map each term to its congruence class. 
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Identity: 


Restriction: 


E-Substitution: 





Fig. 1. Categorical inference rules. 


Definition 3. We call the morphisms in E equations. Ife: X — e is an 
equation then ex: Px — X is called its condition. If X = Px then e is called 
unconditional. An object A in C satisfies the equation e: X — e, written 
A Ee, if and only if A is {e}-injective. | trivially extends to sets of equations. 








5 Sound Deduction 


In this section we give four inference rules for equations as arrows as defined in 
the previous section, show that they are sound and give some examples. The first 
three rules also appeared in [35]. The fourth rule appeared in an over-simplified 
form in [85] because conditional axioms were not allowed there. 

In this section we assume that C, besides a factorization system (£, M} and 
enough €-projectives, also has pushouts. 


Definition 4. Given a set of equations E, let + denote the derivation relation 
generated by the rules in Fig. [| where E-Substitution is a class of rules, one 
for each f: Py — X. If the source of e is X and E F e then e is called an 
X-derivation of E. Let Dx(E) denote the full subcategory of X | E of X- 
derivations of E. 
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Note that E F e for each e € E since one can take f = ey in E-Substitution, and 
also that Dx (£) can be a class in general because E can be a class. Since equa- 
tions in E were allowed to have only €-projective sources in [35], E-Substitution 
was a simple pushout there, for which reason it was called E-Pushout. 


Theorem 1. Soundness. E F e implies E Fe. 





Proof. The soundness of the first three rules is easy; we only show the soundness 
of £-Substitution. Let us assume that Æ ël: let A be any object such that 
A H E, and let g: X — A be any morphism. 








ey eck 
Py —> Y — eè 





Since A — ef, there is a morphism +A like in the diagram above, such that 





ef; h = g. Further, since A — e there is a morphism h’ such that e;h’ = f°; h. 
Hence f;g = (ey;e);h’, so by the pushout property there is a morphism g’ such 
that (ey;e)/;g' = g. Therefore, A H (ey;e)/, i.e., E K (ey;e)f. 











Example 3. We show that the three arrows E = {axiomy, axiom2, axiom3} de- 
fined in Example [I] define indeed the groups, that is, that the remaining arrows 
gi: T(x) > è generated by (1z,x), and g2: Ts(x) — e generated by (zx, 1), 
stating the left-unit and left-inverse axioms of groups, can be derived from E. 
The table in Figure [2] shows a possible proof, where the first column shows or 
gives names to newly inferred arrows, the second shows a set of generators of the 
kernel of the new arrow (a dash “-” means that the set of generators is obvious, 
so we do not write it to save space), and the third column shows the inference 
rule used to derive the new arrow (identity is omitted). 

To derive e1, for example, one applies the substitution rule for e = axiom, 
where f: T(x) —> Ty(a) takes x to Tx, using tacitly the identity rule on 17,2): 


1 z 
T(z) T35(2) Ts x) azı r 


P Y | 


Ty (a) ee e 


e1:=(ar1)f 


We showed in 29 inference steps that the three axioms define groups. The careful 
reader may have noticed that we have used unnecessarily many Restriction steps. 
Indeed, if one does all the substitutions first, followed by all the unions, and then 
by reductions, then one can prove the above in only 19 steps. 
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[Generated by C 


(Gax)1, Tx Substitution : axiom, 
Substitution : axiome 
%2|Substitution : axioms 
Union : e1 U e2 
Union : e3 U e4 
Restriction : e5 
Substitution : axiom, 
Substitution : axiom3 
Union : e7 U eg 
Restriction : e9 
Substitution : axiom, 
Union : e10 U €11 
Restriction : e12 
Substitution : axiom3 
Union : e13 U €14 
Restriction : e15 
Substitution : axiom3 
Union : e16 U €17 
Restriction : e18 
Union : e2 U e19 
Restriction : e20 
Union : es U e21 
Restriction : e22 
Union : e14 U g2 
Restriction : e23 
Substitution : axiom, 
Union : e24 U e25 
Union : e11 U e26 
Restriction : e27 





Fig. 2. Deriving the remaining group properties. 


As mentioned before, a benefit of our deduction system is that one can also 
directly infer conditional equations. 


Example 4. In the same context as in Example one can infer the condi- 
tional equation (Vz) « = @ if wx = 1, which in our notation is the arrow 
93: Ts(£)/(æx,1) —> © generated by (a,%) as in the table in Figure [B] Note 
that eso was possible since its source was Ty(«)/(22,1)- 


Example 5. In the context of groups without square roots of unity in Exam- 
ple 2] where Æ = {axiom,, axiomg, axioms, axiom4}, we can derive the condi- 
tional equation (Vz, y) x = y if xy = yT, which in our notation is the morphism 
ga: Ts(2,y)/(azyz) ` ° generated by (x,y). To apply substitution on axiom,, 
with f: Ts(x) > Ts(2,y)/(eg yz) taking x to zy, where Y = Ts(#)/(zz,1) 


and X = Ts(2,y)/(ag,yz) and where ef, is generated by ((#¥)(z7),1) and 
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Ten = e [Generated by 


€1, --, €22, 92,-+, 


€28 
€29 
€30 
€31 
€32 
93 


€1, «++, €22; 92, 
€28 
€29 
€30 
e31 
€32 
€33 
€34 


€35 
f 


ey 

(ey; axiom.) 
€36 

€37 

e38 

e39 

€40 

e41 

g4 


., gı |same as before 





same as before 
Substitution : axiom3 
Union : g2 U e28 
Restriction : e29 
Union : e7 U €30 
Union : gi U e31 
Restriction : e32 


gi|same as before 


same as before 
LY) (<17), ((zy)x)y |Substitution : axioms 
#9) (x 7), Ky Tz)x)y |Restriction : e28 
Ex), (YT Substitution : axiom3 
Union : e29 U e30 
Union : g2 U e31 
Restriction : e32 
Substitution : axiom, 
Substitution : axiome 
Restriction : e35 
Substitution : 
Substitution : 


axiom4 
axiom3 


: (ey; axioma)! U ese 

: €37 U €39 

: e38 U €40 
Restriction : e41 


Fig. 4. xyz 


(ey;axiom4)f by (27,1), we first derive ef, like in Figure [4] The diagram be- 
low shows the relevant morphisms involved in this proof: 


6 Finiteness 


axiom, 


Ty (a) £ y y n 


Y 
Tan 
Le: ES 

Sioa 


(ey ;axiom4)f 


e<_ o 


Since derivation of arrows involves a finite number of steps, one cannot expect 
any deduction system to be complete without some form of finiteness require- 
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ments. In this section we first recall the usual categorical concept dedicated to 
finiteness, then instantiate it to our framework, and then add one more require- 
ment to factorization systems that makes them deal with finiteness smoothly. 

A nonempty partially ordered set (Z,<) is directed provided that each pair 
of elements has an upper bound. A directed colimit in a category K is a colimit 
of a diagram D: (Z,<) — K, where (Z,<) is a directed poset (regarded as 
a category). An object K of a category K is finitely presentable provided that 
its hom-functor Hom(K,_): K — Set preserves directed colimits. It is easy 
to see that K is finitely presentable iff for each directed colimit ({y;: D(i) - 
Ch}ie|z|,C) and each morphism f: K — C, there is an i € |Z| and a unique 
morphism fi: K — D(i) such that fi; yi = f. 

There are many examples of finitely presentable objects, such as finite sets 
and posets, finite graphs and automata, finite and discrete topological spaces, 
algebras presented by finitely many generators and finitely many equations, etc. 
We refer the interested reader to [I] for many more examples, as well as inter- 
esting properties of finitely presentable objects. What is relevant to our paper 
is that a surjective morphism e: X — e of algebras is finitely presentable in the 
comma category of surjective morphisms of source X iff its kernel, regarded as 
a subalgebra of X x X, is finitely generated; in our setting, where equations are 
surjective morphisms, that means that e stands for a finite set of equations. 


Definition 5. Equatione: X — e is finite iff it is finitely presentable in X | E. 


IDC X | € is a finite diagram of finite equations, then with the notation 
in Definition 2] by Proposition 1.3 in [I] it follows that ep is also finite. In par- 
ticular, e1 U e2 is finite whenever e and ez are finite, so finiteness is preserved 
by union. We next give conditions under which finiteness is also preserved by 
pushout. Given a morphism m: X’ — X in M, one can build “up to an iso- 
morphism” a functor Fm: X | E —> X’ | E as follows: for each e: X — e, let 
Fim(e): X’ — e be the epic by which m;e factorizes, and for each e1: X > Xj, 
e2: X > Xə and y: Xı > Xə with e1;7 = e2, let Fin(y) be the unique “up 
to isomorphism” morphism given by the diagonal-fill-in property applied to the 
diagram Fm(e2); m2 = Fim(e1); (mı; 7), where m; e; factors through F,,(e1); mı 
and m; ez factors through Fm(e2); ma, like in the diagram below: 
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Fm(e) should be thought of as the restriction of e to X’. Interestingly, Fm does 
not preserve colimits in general. For example, if C is the category of sets then 
one can take X = {a1,a2,a3}, X’ = {a1,a3}, and e1: X > e and eg: X +e 
such that e1(a1) = e1(a2) and e2(a2) = e2(a3), respectively, and note that (e1 U 
€2)(a1) = (e1 U e2)(a3), while (Fm(e1) U Fm(e2))(a1) # (Fm(e1) U Fm(e2)) (as), 
where m is the inclusion X’ C X. However, Fm does preserve directed colimits 
both for sets and algebras. The proof is relatively easy but takes much space, so 
we let it as an exercise to the interested reader (Hint: work with kernels instead 
of epics). With the notation above, 


Definition 6. The factorization system (E, M) is reasonable provided that Fm 
preserves directed colimits for each m € M. 


The following important property can be shown: 


Proposition 3. In the context of Proposition H} if (E,M) is reasonable and e 
is finite, then ef is finite. 


Proof. Due to factorization, it suffices to show the result separately for f € E 
and for f € M. Let D C X | E be a directed diagram and let h be a morphism 
such that e/;h = ep (see Definition B). 


y 
| 


ei 





e 


X! 
FA 
/ 
X 





e —> Xp 


eD 


If f € E then note that f; D is also a directed diagram and that es.p = f; ep. 
Since e is finite and since e; (f°; h) = ef;p, there is some e;: X — e in D such 
that f; e; factors through e, i.e., there is some morphism g; such that f; ei = e; gi. 
By the pushout property of e and f, it follows that there is some morphism h; 
such that ef; h; = e; and f°; hi = gi. Hence, e; factors through ef, so ef is finite. 
If f E€ M then, since (£, M} is reasonable, there is some morphism m € M with 
f; en = ef; D); m. 
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&F ¢(D) 


i m 


D a 





>e —5 Xp 





ep 


Then e; (f; h) = e¢,(p);m, so by the diagonal-fill-in property there is a mor- 
phism h’ such that e; h' = e¢,(p) and h'; m = f°; h. Since e is finite and F(D) 
is directed, there is some e; € D such that Fp(e;) factors through e, so there 
is an hi with Fr(e;) = e; hj. Therefore, e;(hi;mi) = f; ei, where m; € M is 
such that f;e; factors as Fy(e;);m:, so by the pushout property there is some 
morphism h; such that ef ; hi = e; and f°; h; = hi; mi. Hence e; factors through 
ef, so ef is also finite. 


7 Completeness 


In this section we fix the following 


Framework: A category C that 
— admits a reasonable factorization system (£, M), 
— has enough €-projectives, 
— is E-co-well-powered, 
— has colimitg?, 


and show that, under appropriate finiteness conditions, the four rules presented 
in the previous section are complete wrt satisfaction as injectivity. 

The usual notion of closure under inference rules is extended to classes of 
epics; in particular, D C X | € is closed under E-substitution iff for any e: Y —> 
ein E and any f: Py > X, if ef, is in D then so is (ey; e)/. Notice that Dx (E) 
is closed under all the four inference rules, so it is non-empty (because of closure 
under Identity) and directed (due to closure under Union). If Dx(£) is not a set 
then, since C is €-co-well-powered, it can be replaced by some representative set 
that it includes, so we can let ep,(g): X > Xpx (xz) denote its colimit object, 
as usual (see Definition 2). Then, with the notation in Definition 2] 


Theorem 2. If E contains only equations of finite conditions, then 





1. Xp = E for any non-empty directed diagram D C X | E closed under 
Restriction and E-Substitution; 


? Actually only directed colimits and certain pushouts are needed. 
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2. For any equation e of source X, E = e iff Xpx (BE) F €; 
3. Completeness. E = e implies E+ e whenever e is finite. 





Proof. 1. Lete: Y — e be any equation in E and let g: Y — Xp bea morphism. 
Since Py is €-projective and since ep € E, there is a morphism f: Py —> X 
such that ey;g = f;ep. 


e eck 
Py — | — e 


| NI ie) 
i 
f 


Since ey is an arrow in the pushout of f and ey, ep factors through ef, and 
since ef is finite (Proposition B) and D is directed and non-empty, there is an e’ 
in D which factors through ef. It follows then that ef, € D because D is closed 
under Restriction, and further that (ey;e)/ € D because D is closed under F- 
Substitution. Thus there is a morphism y such that (ey; e)/;y = ep. Notice that 
ey; (e; f9; y) = f; (ey;e)f; y = fien = evsg, so e; (f9; y) = g because 
ey is an epimorphism. Therefore, Xp F e. 

2. If E — e then by 1., noticing that Dx(E) is closed under Restriction and 
E-Substitution and is directed (because it is closed under Union) and non-empty 
(because it is closed under Identity), it follows that Xp,(m) } e. Conversely, if 
Xpp) F e then there is an e’ such that e;e’ = ep, (zg). Let A H E and let 
h: X — A. Since A | Dx(E£), for each e; € Dx(£) there is a 3; such that 
ej; 0; = h. Then A together with the morphisms form a cocone in C for Dx (E), 
so there is a unique g: Xp(g) > A such that yj; g = 8; for all e; € Dx(£). It 
follows then that e; (e’; g) = eD% (E); 9 = €j; Yj; 9 = ej; Bj = h, that is, A Fe. 

3. Xpx(B) E e by 2., so there is an e’ such that e; e' = ep, g). Since e is finite 
and since Dx(£) is non-empty, there is an ej in Dx(£) which factors through 
e. Since E F ej, by Restriction it follows that E F e. 


























Therefore, under reasonable and necessary finiteness conditions, the four rule 
inference system can be used to derive any arrow e which is injectively satisfied 
by all objects satisfying Æ. On the one hand, this can be regarded as a purely 
categorical characterizing result, independently from logics. On the other hand, 
instantiated to equational logics it gives an inference system which can derive any 
conditional equational semantical consequence directly. For example, the Iden- 
tity rules corresponds to reflexivity; the Union rule corresponds to closures under 
transitivity and congruence closures over conclusions of conditional equations, 
assuming that they have (provably) the same hypotheses (note that closure un- 
der symmetry is implicit, because kernels or morphisms are symmetric binary 
relations); the Restriction allows one to retain only a part of the conclusion of 


170 Grigore Rosu 


a conditional equation, in case one proved more than needed; finally, the E- 
Substitution rule corresponds as expected to substitution, but note that it can 
also derive conditional equations (when X is not projective). 


8 Conclusion and Future Work 


We presented a four rule categorical deduction system for a categorical abstrac- 
tion of equational logics, in which equations are regarded as epimorphisms and 
their satisfaction as injectivity. We showed that under reasonable finiteness con- 
ditions, the four rule deduction system is complete. The research presented in 
this paper is part of a project aiming at developing a categorical framework 
in which axiomatizability, complete deduction and interpolation can be treated 
uniformly. Birkhoff variety and quasi-variety results for equations regarded as 
epics and for satisfaction regarded as injectivity are known and considered folk- 
lore among category theorists. The results in this paper show that there is also 
complete deduction within this framework. We are not aware of other similar 
categorical completeness results in the literature, except previous work by the 
author [35] where only unconditional axioms where supported and some interest- 
ing results by Diaconescu [14] within his category-based equational logic, where 
equations were regarded as parallel pairs of arrows and his five inference rules 
were the typical ones for equational deduction. 

There is much challenging research to be done. Can the Craig-like interpola- 
tion results in be instantiated to the categorical equational logic framework 
presented in this paper? Can the present results be dualized, hereby obtaining 
complete deduction for some variant of modal or coalgebraic logics? Would it be 
possible to implement the four rules and thus develop an arrow-based, perhaps 
graphical, equational reasoning engine? 


Dedication. The author dedicates this paper to his former PhD adviser, Joseph 
Goguen, to whom he warmly thanks for all his teachings and unforgettable time 
spent at the University of California at San Diego. The author is also grateful to 
Joseph Goguen for his enthusiasm in categorical approaches to equational logics, 
and in particular for his encouragements in writing this material up. 
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Abstract. Superpositions are useful relationships between programs or 
components in component based approaches to software development. 
We study the application of invasive superposition morphisms between 
components in the architecture design language CommUnity. This kind 
of morphism allows us to characterise component extension relationships, 
and in particular, serves an important purpose for enhancing components 
to implement certain aspects, in the sense of aspect oriented software de- 
velopment. We show how this kind of morphism combines with regulative 
superposition and refinement morphisms, on which CommUnity relies, 
and illustrate the need and usefulness of extension morphisms for the 
implementation of aspects, in particular, certain fault tolerance related 
aspects, by means of a case study. 


1 Introduction 


The demand for adequate methodologies for modularising designs and develop- 
ment is increasing rapidly, due to the inherent complexities of modern software 
systems. Of course, these modularisation methodologies do not affect only the 
final implementation stages, but they also have an impact on earlier stages of 
software development processes. Thus, it is generally accepted that, for the mod- 
ularisation to be effective (and persistent, and resistant to evolution), it needs 
to be applied from the start, at the level of specification or modelling of systems. 
Modularising, or structuring, specifications has important benefits. It allows one 
to divide the specifications into manageable parts, and to evaluate the conse- 
quences of our architectural design decisions prior to the implementation of the 
system. Moreover, it also favours the reuse of parts of the resulting implemen- 
tations, and their adaptations and extensions for new application domains. 

In the area of critical systems, specification languages are required to have 
a precise meaning (since formal semantics is crucial for eliminating ambigui- 
ties in specifications, and for developing tools for verification), and therefore 
specifications tend to be much longer than those of informal frameworks. Thus, 
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mechanisms for structuring or modularising specifications and designs are es- 
pecially important for formal specification languages, as they help to make the 
specification and verification activities scalable. There exist many formal spec- 
ification languages which put an emphasis on the way systems are built out of 
components (e.g., those reported in [[9]9[20]18]). CommUnity is one of these 
languages; it is a formal program design language which puts special emphasis 
on ways of composing abstract designs of components to form designs of systems 
[6[5). CommUnity is based on Unity [8] and IP [8], and its foundations lie in the 
categorical approach to systems design [10]. Its mechanisms for composing speci- 
fications have a formal interpretation in terms of category theoretic constructions 
[5J6]. Moreover, CommUnity’s composition mechanisms combine nicely with a 
sophisticated notion of refinement, which involves separate concepts of action 
blocking and action progress. 


We are particularly interested in CommUnity because, in our view, its design 
composition mechanisms make it suitable for the specification and combination 
(or “weaving”) of aspects, in the aspect oriented software development sense 
[7]. Moreover, its rather abstract designs for components allow us to deal with 
aspects at a design level, in contrast to most of the work on aspects, which 
concerns implementation related stages (e.g., [14]). Some evidence of the ade- 
quacy of CommUnity as a design language for aspects relies on the possibility of 
defining higher-order connectors [I6]. As shown in [I6], a wide variety of aspects 
(e.g., fault tolerance, security, monitoring, compression, etc) can be superim- 
posed on existing CommUnity architectures, by building “stacks” of more and 
more complex connectors between components. 


Higher-order connectors provide a very convenient way of enhancing the be- 
haviour of an architecture of component designs, by the superimposition of as- 
pects. A crucial characteristic of CommUnity, which makes this possible, is the 
complete externalisation of the definition of interaction between components 
(a feature also exhibited by various other architecture description languages). 
The component coordination mechanism of CommUnity reduces the coupling 
between components to a minimum, and makes it feasible to superimpose be- 
haviour (related to aspects) on existing systems via superposition and refinement 
of components. However, higher-order connectors are not powerful enough for 
defining various kinds of aspects, since some of these, as we will show, require 
extensions of the components as well as in the connectors. Thus, we are forced 
to consider another kind of superposition, known as invasive superposition ; 
which allows us to define extensions of components. By combining extension with 
regulative superposition and refinement, we believe that we obtain a powerful 
framework in which we can define architectures, and enhance their behaviours 
by superimposing behaviour through aspects defined in terms of component 
extension and higher-order connectors. Having the possibility of extending com- 
ponents also provides us with a way of balancing the distribution of extended 
behaviour among connectors and components, which would otherwise be put 
exclusively on the connector side. This problem has also arisen in the context 
of object oriented design and programming, attempting to define various forms 
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of inheritance, resulting in the proposals attempting to characterise the concept 
of substitutability [L521]. We believe that this proposal provides a more solid 
foundation for substitutivity, one that is better structured and more amenable 
to analysis. We propose a definition of extension in CommUnity, partly mo- 
tivated by the definitions and proof obligations used to define the structuring 
mechanisms in B [J4], that justifies the notion of substitutivity and provides a 
structuring principle for augmenting components by breaking encapsulation of 
the component. (Perhaps this should be considered a contradiction in terms!) 

We show how extension morphisms combine with the superposition and re- 
finement morphisms already present in CommUnity. We will also illustrate the 
need and usefulness of extension morphisms for the implementation of aspects, 
by means of a case study, based on a simple sender/receiver architecture com- 
municating via an unreliable channel, which is then enhanced with some typical 
aspects, imposing a standard fault tolerance mechanism. 


2 The Architecture Design Language CommUnity 


In this section, we introduce the reader to the CommUnity design language and 
its main features, by means of an example. The computational units of a system 
are specified in CommUnity through designs. Designs are abstract programs, in 
the sense that they describe a class of programs (more precisely, the class of all 
the programs one might obtain from the design by refinement), rather than a 
single program [23)5]. 

Before describing in some detail the refinement and composition mechanisms 
of CommUnity, let us describe the main constituents of a CommUnity design. 
Let us first assume that we have a fixed set ADT = (X ADT, ® apr) of datatypes, 
specified as usual via a first-order specification. A CommUnity design is com- 
posed of: 


— A set V of channels, typed with sorts in ADT. V is partitioned into three 
subsets Vin, Vpry and Vout, corresponding to input, private and output chan- 
nels, respectively. Input channels are the ones controlled, from the point of 
view of the component, by the environment. Private and output channels are 
the local channels of the component. The difference between these is that 
output channels can be read by the environment, whereas private channels 
cannot. 

— A first-order sentence Init(V), describing the initial states of the design. 

— A set I of actions, partitioned into private actions I,, and public actions 
Ipub- Each action g € I’ is of the form: 


g|D(g)| : L(g), U(g) > Rg) 


where D(g) C Vprv U Vout is the (write) frame of g (the local channels that 
g modifies), L(g) and U(g) are two first-order sentences such that apr F 


t Some versions of CommUnity, such as the one presented in [17], do not include an 
initialisation constraint. 
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U(g) = L(g), called the lower and upper bound guards, respectively, and 
R(g) is a first-order sentence a(V U D(g)’), indicating how the action g 
modifies the values of the variables in its frame. (D(g) is a set of channels 
and D(g)’ is the corresponding set of “primed” versions of the channels in 
D(g), representing the new values of the channels after the execution of the 
action g.) 


The two guards L(g) and U(g) associated with an action g are related to re- 
finement, in the sense that the actual guard of an action g, implementing the 
abstract action g, must lie between L(g) and U(g). As explained in [I7], the 
negation of L(g) establishes a blocking condition (L(g) can be seen as a lower 
bound on the actual guard of an action implementing g), whereas U(g) estab- 
lishes a progress condition (i.e., an upper bound on the actual guard of an action 
implementing g, in the sense that it implies the enabling condition of an action 
implementing g). 

Of course, R(g) might not uniquely determine values for the variables D(g)’. 
As explained in [I7], R(g) is typically composed of a conjunction of implications 
pre => post, where pre is a precondition and post defines a multiple assignment. 

To clarify the definition of CommUnity designs, let us suppose that we would 
like to model the unreliable communication between a sender and a receiver. We 
will abstract away from the actual contents of messages between these compo- 
nents, and represent them simply by an integer, identifying particular messages. 
Then, a sender is a simple CommUnity design composed of: 


— An output channel msg: int, representing the current message of the sender. 

— A private channel rts: bool (“ready to send”), indicating whether the 
sender is ready to send the current message or not (messages need to be 
produced before sending them). 

— An action send, which, if the sender is ready to send (indicated by the 
boolean variable above), then goes back to a “ready to produce” state (char- 
acterised by the rts variable being false). 

— An action prod, that, if the sender is in a “ready to produce” state, incre- 
ments by one the msg variable (i.e., generates a new message to be sent) and 
moves to a “ready to send” state. 


The CommUnity design corresponding to this component is shown in Figure [I] 

In Fig. [I] the actions of the design have a single guard, meaning that their 
lower and upper bound guards coincide. We will illustrate refinement through 
more abstract designs below. An important point to notice in the sender design 
is the way it communicates with the environment through the send action. This 
action does not make a call to an external action, as one might expect; it will 
be the responsibility of other components to “extract” the value of the output 
variable msg, by synchronising other actions with the send action of the sender. 
This will become clearer later on, when we build architectures and describe in 
more detail the model of interaction between components in CommUnity. 

To complete the picture, let us introduce some further designs. One is a simple 
component with a single integer typed output variable, used for communication 
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Design Sender 


out 
msg: int 
prv 
rts: bool 
init 
msg=0 ^ rts=false 
do 


prod[msg,rts]: =rts — rts’=true ^ msg’=msg+1 
[] send[rts]: rts — rts’=false 


Fig. 1. A CommUnity design for a simple sender component. 


and for modelling the loss of messages (Figure [2). The other one is a receiver 
component, somewhat similar in structure to the sender, but with an input 
variable instead of an output one, and a boolean channel rtr (ready to receive) 
instead of rts (Figure[3). To complete the picture, let us introduce some further 
designs. One is a simple component with a single integer typed output variable, 
used for communication and for modelling the loss of messages (Figure P). The 
other one is a receiver component, somewhat similar in structure to the sender, 
but with an input variable instead of an output one, and a boolean channel rtr 
(ready to receive) instead of rts. To complete the picture, let us introduce 
some further designs. One is a simple component with a single integer typed 
output variable, used for communication and for modelling the loss of messages 
(Figure[2). The other one is a receiver component, somewhat similar in structure 
to the sender, but with an input variable instead of an output one, and a boolean 
channel rtr (ready to receive) instead of rts. 


Design Communication_Medium 
in 
in_msg: int 
out 
out_msg: int 
prv 
rts: bool 
init 
out_msg=0 ^ rts=false 
do 
transmit [out_msg,rts]: arts — out_msg’=in_msg ^ rts’=true 
[] lose[]: arts — true 
[] send[rts]: rts — rts’=false 


Fig. 2. A CommUnity design for an unreliable communication medium. 
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Design Receiver 


in 


out 


msg: int 


curr_msg: int 


local 


rtr: bool 


init 


do 


curr_msg=0 A rtr=true 


rec[rtr,curr_msg]: rtr — rtr’=false ^ curr_msg’=msg 


[] prv cons[rtr]: a=rtr — rtr’=true 


Fig. 3. A CommUnity design for a receiver component. 


2.1 Refinement Morphisms 


Refinement morphisms constitute an important relationship between CommU- 
nity designs. Not only do these morphisms allow one to establish “realisation” 
relationships, indicating that a component is a more refined or concrete version 
of another one, but they also serve an important purpose for parameter instanti- 
ation. In particular, refinement morphisms are essential for the implementation 
of higher-order connectors [I6]. 


We will not give a fully detailed description of refinement morphisms here. We 


refer the interested reader to |6[23]16]17[5| for a detailed account of refinement 
in CommUnity. 


P, 


Oac 


A refinement morphism o : Pi — Pz between designs P) = (Vi, l1) and 


= (V2, I2) consists of a total function cen : Vi > V2 and a partial function 


$ I> —> I such that: 


Och preserves the sorts and kinds (output, input or private) of channels; 
moreover, Cen is injective on input and output channels, 

Jac Maps shared actions to shared actions and private actions to private 
actions; moreover, every shared action in I) has at least one corresponding 
action in I (via oz), 

the initialisation condition is strengthened through the refinement, i.e., 
apr F Init p, > a(Initp,), 

every action g € I whose frame D2(g) includes a channel o¢p(v), with 
v € Vi, is mapped to an action oa-(g) whose frame D1(oa-(g)) includes v, 
if an action g € I> is mapped to an action o,-(g), then ®apr F La(g) > 
a(Li(Gac(g))) and Papr F Ro(g) => a(Ri(ac(9))), 

for every action g € I, Bapr H a(Ui(9)) > Vireo-1(g) U2(h). 


As specified by these conditions, the interval determined by the lower and 


upper bound guards can be reduced through refinement, and the assignments can 
be strengthened. The interface of a component design, determined by the output 
and input channels and shared actions, is preserved along refinement morphisms, 
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and the new actions that can be defined in a refinement are not allowed to modify 
the channels originating in the abstract component. Essentially, one can refine a 
component by making its actions more detailed and less underspecified (cleverly 
characterised by the reduction of the guard interval and the strenghthening of 
the assignments), and possibly adding more detail to the component, in the form 
of further channels or actions [17]. 

As an example of refinement, consider the more abstract version of the sender 
design shown in Figure |4| Notice that the assignment associated with prod is 
more abstract or liberal than the assignment of the same action in the Sender 
design. Also, the lower bound guards of both actions are equivalent to those of the 
corresponding actions in Sender, but the upper bound guards are strengthened 
to false. Clearly, Abstract_Sender is a more abstract version of the Sender 
(or, equivalently, Sender is a refinement of Abstract_Sender), and it is not 
difficult to prove that there exists a refinement morphism between these designs. 
In fact, Abstract_Sender is also a refinement of the Communication Medium 
component (where the abstract prod operation corresponds to the operations 
lose and transmit), although it is less evident than in the first case. 


Design Abstract_Sender 


out 
msg: int 
prv 
rts: bool 
init 
msg=0 ^ rts=false 
do 


prod[msg,rts]: —rts, false — rts’=true ^ msg’€int 
[] send[rts]: rts, false — rts’=false 


Fig. 4. A more abstract CommUnity design for a sender component. 


2.2 Component Composition 


In order to build a system out of the above components, we need a mechanism 
for composition. The mechanism for composing designs in Community is based 
on action synchronisation and the “connection” of output channels to input 
channels (shared memory). Basically, we need to connect the sender and receiver 
through the unreliable medium. This can be achieved by: 


— identifying the output channel msg of the sender with the input channel 
in_msg of the medium, 

— identifying the input channel msg of the receiver with the output channel 
out_msg of the medium, 


180 Nazareno Aguirre, Tom Maibaum, and Paulo Alencar 


— synchronising the action send of the sender with actions transmit and lose 
of the medium, 
— synchronising the action send of the medium with action rec of the receiver. 


The resulting architecture can be graphically depicted as shown in Figure [5] In 
this diagram, the architecture is shown using the CommUnity Workbench 
graphical notation, where boxes represent designs, with its channels and actiond), 
and lines represent the interactions (“cables” in the sense of [I7]), indicating 
how input channels are connected to output channels, and which actions are 
synchronised. Notice that, in particular, action send of the sender is connected 
to two different actions of the medium; this requires that, in the resulting system, 
there will be two different actions corresponding to (or “invoking”) the send 
action in the sender, one that is synchronised with transmit and another one 
that is synchronised with lose. This allows us to model very easily the fact that, 
sometimes, the sent message is lost (when the action send-lose is executed), 
without using further channels in the communication medium. 


Comm_Medium Receiver 
[romaine 
[Ea reo nt p> 


transmit 


















































lose 











send 





Fig. 5. A graphical view of the architecture of the system. 


Semantics of Architectures. Comm Unity designs have an operational seman- 
tics based on (labelled) transition systems. Architectural configurations, of the 
kind shown in Fig. [5] also have a precise semantics; they are interpreted as cat- 
egorical diagrams, representing the architecture [I7]. The category has designs 
as objects and the morphisms are superposition relationships. A superposition 
morphism between two designs A and B captures, in a formal way, the fact that 
B contains A, and uses it while respecting the encapsulation of A (regulative 
superposition). The interesting fact is that the joint behaviour of the system 
can be obtained by taking the colimit of the categorical diagram corresponding 
to the architecture [5J6]. Therefore, one can obtain a single design (the colimit 
object), capturing the behaviour of the whole system. 


5 Private actions are not displayed by the CommUnity Workbench, although we de- 
cided to show these actions, conveniently annotated, in the diagrams. 
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More formally, a superposition morphism a : Pı — P> between designs P) = 
(Vi, I) and Pz = (V2, T2) consists of a total function Cen : Vi > V2 and a partial 
function Gac : To — I such that: 


— Gen preserves the sorts of channels; private and output channels must be 
mapped to channels of the same kind, but input channels can be mapped to 
output channels, 

— Cac Maps shared actions to shared actions and private actions to private 
actions, 

— the initialisation condition is strengthened through the superposition, i.e., 
apr F Init p, > a(Initp,), 

— every action g € I whose frame Də(g) includes a channel o¢p(v), with 
v € Vi, is mapped to an action oac(g) whose frame D;(oac(g)) includes v, 

— if an action g € I is mapped to an action oa-(g), then Bapr F L(g) > 
a(Li(Fac(g))), Paptr F Ro(g) => a(Ri(tac(g))), and apr F Ua2(g) => 
a(Ui(Gac(g))): 


As for refinement morphisms, superposition morphisms allow assignments to be 
strengthened, but not weakened. Intuitively, P> enhances the behaviour of Pı 
via the superposition of additional behaviour, described in other components 
(and synchronised with P,). So, the actions of the augmented component P> 
“using” corresponding actions in P; do at least what the actions of Pı origi- 
nally did. Since actions in P> should use the corresponding actions in Pı within 
enabledness bounds, the lower bound guards of actions in P; must be strength- 
ened when superposed in actions of P2. Notice however that, as opposed to the 
case of refinement morphisms, upper bound guards can be strengthened, but not 
weakened; as explained in [I7], this is a key difference between refinement and 
superposition, and reflects the fact that “all the components that participate in 
the execution of a joint action have to give their permission for the action to 
occur.” (cf. p. 9 of [I7]). 


3 Component Extension in CommUnity 


In this section we describe the main contribution of this paper, namely, a new 
kind of morphism between components for CommUnity. This kind of morphism, 
that we call extension morphism, enables us to establish extension relationships 
between components (of the kind defined by inheritance in object orientation), 
and is of a different nature, compared to the already existing refinement and 
superposition morphisms of CommUnity. 

In order to illustrate the need for extension morphisms, let us consider the fol- 
lowing case. Suppose that, for the existing system of communicating sender and 
receiver, we would like to superimpose behaviour related to the monitoring of the 
received messages. As explained in [I6], this is possible to achieve, in an elegant 
and structured way, by using higher-order connectors. Essentially, an abstract 
monitoring structure is defined; this structure is composed of various abstract 
designs, used for characterising roles of the architecture, like sender, receiver and 
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monitor, and others necessary for the implementation of the “observed connec- 
tor”. These abstract designs are interconnected as shown in Figure[6] We will not 
describe these designs in detail, and refer the reader to [16], where a detailed de- 
scription of this higher-order connector is given. However, it is important to men- 
tion that Abstract_Sender (which is given in Fig. 4) and Abstract Receiver 
can be refined by, essentially, any pair of components providing the basic func- 
tionality for sending and receiving messages. Then, this higher-order connec- 
tor is plugged into the existing architecture, through refinement, to obtain the 
resulting architecture of Figure [7] It is important to notice the difference be- 
tween Figures [6] and [7] Fig. [6]describes the (abstract, non instantiated) higher- 
order connector for monitoring, whereas Fig. [7] described the instantiation of 
this higher-order connector (see how Abstract_Sender, Abstract Receiver and 
Abstract Monitor have been instantiated by Communication Medium, Receiver 
and Simple_Monitor, respectively). The reader might observe that the actual 
monitor that we are using, described in Figure [8] simply counts the number 
of messages received by the receiver component. Notice that the guard of the 
monitor must be as weak as possible (i.e., true), to avoid interfering with the 
behaviour of the monitored operations. 

In , several aspects are characterised and superimposed by using this same 
technique. 
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Fig. 6. A higher-order connector for monitoring. 


Now suppose that we would like to superimpose a “resend message” mecha- 
nism on the architecture, in order to make the communication reliable. We can 
capture the loss of a “packet” through a monitor, instead of using it simply for 
counting the messages, as we did before. However, for the sender to reset and 
start sending the message again, we need to replace it with a slightly more sophis- 
ticated sender component, namely one with a reset operation, such as the one 
shown in Figure D] Notice that RES_Sender cannot be obtained from Sender by 
superposition, since it is clear that the new reset operation modifies a channel 
originating in Sender. RES_Sender cannot be obtained through the refinement of 
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Fig. 7. Communication enhanced with a monitoring system. 


Design Simple_Monitor 
in 
msg: int 
prv 
counter: int 
init 
counter=0 
do 
rec[counter]: true — counter’=countert+ti 


Fig. 8. A simple monitor to count received messages. 


Sender either, since clearly its reset action, which modifies channels originating 
in Sender, should be mapped to a corresponding action in this design, but it 
does not respect any of the original assignments of actions prod and send, so it 
cannot be mapped to any of these. 


However, there exists a clear relationship between the original Sender com- 
ponent and the new RES_Sender: the state of the original is extended, and more 
operations are provided (which might modify the channels of the original com- 
ponent), but the effect of the original actions is maintained. This relationship is 
a special case of what is called invasive superposition [12]. 


Invasive superposition has already been recognised as a possible relationship 
between CommUnity designs in [6]; moreover, therein it has been shown that 
CommUnity designs and invasive superpositions constitute a category. However, 
not much attention has been paid to invasive superposition for the architec- 
tural modelling of systems in CommUnity so far. Although not in the context 
of CommUnity, some researchers have employed various kinds of superpositions 
for defining architectures of components and augmenting their behaviours, par- 
ticularly the work in [II]. Here, we propose the use of invasive superposition for 
characterising component extension in CommUnity. 
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Design RES_Sender 
in 
lost-msg: int 


out 
msg: int 
prv 
rts: bool 
init 
msg=0 ^ rts=false 
do 


prod[msg,rts]: =rts — rts’=true ^ msg’=msg+1 
[] send[rts]: rts — rts’=false 
[] reset[msg,rts]: true — rts’=true ^ msg’=lost-msg 


Fig. 9. A CommUnity design for a sender component with a reset capability. 


A distinguishing property typically associated with sound component ex- 
tension is what is normally known as the substitutability principle [I5]. This 
principle requires, in concordance with the now highly regarded “design by con- 
tract” approach [2I], that if a component P> extends another component P}, 
then one must be able to replace P) by P2, and the “clients” of the original com- 
ponent must not perceive the difference. In other words, component P> should 
behave exactly as P4, when put in a context where P, was expected. It is our aim 
to characterise such extensions through the definition of extension morphisms 
below. 


Definition 1. An extension morphism o : Pi — Pz between designs Py = 
(Vi, Tı) and P> = (V2,I2) consists of a total function oc, : Vi > Vz and a 
partial mapping Cac : 12 —> I, such that: 


— Gen is injective and Oac is surjective, 

— Och preserves the sorts and kinds of channels, 

— Oac maps shared actions to shared actions and private actions to private 
actions, 

— there exists a formula a, using only variables that are contained in (V2 — 
Ocn(Vi)), and such that Bapr F JV: a(V) and ®apr F Initp, = o(Initp,) A 
Q, 

— for every g € Ih such that Cac(g) is defined, and for every v € Vi, if Cen(v) € 
D2(g), then v € Di (Fac(9)), 

— if an action g € I> is mapped to an action oac(g), then Bapr + 
a(L1(Gac(g))) = La(g) and Papr F a(Ui(aclg))) = U2(9), 

— for every g € Iz such that cac(g) is defined, there exists a formula a, using 
only primed variables that are contained in (V3 —acn(Vi)’), such that Bap F 
a(L1(caclg))) => (Ro(g)  (Ri(cac(g))) Aa) and apr | 3d: a(T), where 
U represents the primed variables of a. 











The first condition for extension morphisms requires all actions of the original 
component to be mapped to actions in the extended one, and the preservation 
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of all the channels of the original component. In particular, it is not allowed for 
several channels to be mapped to a single channel in the extended component. 
(Notice that if this was allowed, then the extended component might not be 
“plugged” into architectures where the original component could be “plugged”, 
due to insufficient “ports” in the interface.) The second and third conditions 
above require the types and kinds of channels and actions to be preserved. The 
fourth condition allows the initialisation to be strengthened when a component 
is extended, but respecting the initialisation of the channels of the original com- 
ponent, and via realisable assignments for the new variables. The fifth condition 
indicates that “old actions” of the extended component can modify new vari- 
ables, but the only old variables these can modify are the ones they already 
modified in the original component (in other words, frames can be expanded 
only with new channels). The sixth condition establishes that both the lower 
and upper bound guards can be weakened, but not strengthened. Finally, the 
last condition establishes that the actions corresponding to those of the orig- 
inal component must preserve the assignments to old variables, if the lower 
bound guard of the original component is satisfied; this provides the extension 
with some freedom, to decide how the action might modify old and new vari- 
ables when executed under circumstances where the original action could not 
be executed. Again, it is required for the assignments for new variables to be 
“realisable” . 

Going back to our example, notice that RES_Sender is indeed an extension 
of Sender, where the associated extension morphism o = (Cch, Cac) is composed 
of the identity mappings cch and Cac on channels and actions, respectively. It 
is clear that these mappings are injective and surjective, respectively, and that 
sorts and kinds of channels are preserved by Gen, and the visibility constraints 
on actions are preserved by Gac. Moreover, since the initialisation and the write 
frames, guards and assignments of actions send and prod are not modified in 
the extension, the last four conditions in the definition of extension morphisms 
are trivially met. 

Notice that extension morphisms are invasive, in the sense that new ac- 
tions in the extended component are allowed to modify variables of the origi- 
nal component. However, extension morphisms differ from invasive superposi- 
tion morphisms, as formalised in [6] in various ways. In particular, guards are 
weakened in extension morphisms, whereas these are strengthened in invasive 
superposition morphisms. Moreover, our allowed forms of assignment and ini- 
tialisation strengthening are more restricted than those of invasive superposition 
morphisms. 

It is not difficult to prove the following theorem, showing that, as for other 
morphisms in CommUnity, designs and extension morphisms constitute a cate- 
gory. 


Theorem 1. The structure composed of CommUnity designs and extension mor- 
phisms constitutes a category, where the composition of two morphisms c1 and 
02 is defined in terms of the composition of the corresponding channel and action 
mappings of cı and o2. 
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Proof. The proof can be straightforwardly reduced to proving that the compo- 
sition of extension morphisms is an extension morphism (the remaining points 
to prove that the proposed structure is a category are straightforward). So, let 
cı : Pi > P> and o2 : Po — P; be extension morphisms. The composition 01; a2 
is defined by the composition of the corresponding mappings of these morphisms. 

Let us prove each of the restrictions concerning the definition of extension 
morphism. 


— First, o1,,,;02,, must be injective, and o2,.;01,, must be surjective; this is 
easy to show, since the composition of injective mappings is injective, and 
the composition of surjective mappings is surjective. 

— It is clear that since both oj,, and o2,, preserve the sorts and kinds of 
channels, so does the composition 01,, 3 72,,- 

— We have as hypotheses that there exist two formulas a; and ag, referring to 
variables in (V2 — 01,,,(Vi)) and (V3 — 02,, (V2)) respectively, and such that 
apr | Initp, & o1(Initp,) \ a, and apr F Initp, = a2(Initp,) A ag; 
moreover, both these formulas are “satisfiable”, in the sense that ®4p7 + 

Jor : ai (Tr) and apr + Ady: ag(Tz) . We must show that there exists a 
formula a3, using only variables that are contained in (V3 — 01,,302,,(V1)), 
such that apr + Adz: a3(Vs) and apr F Initp, } 01; 02(Initp,) A Q3. 
We propose a3=02,, (a1) A Q2. 

e The fact that ag refers only to variables in V3—01,, ; 02,,,(Vi)) is obvious. 

e Let us prove that ag is satisfiable. First, since a, is satisfiable, so is 
02,,,(Q1) (satisfiability is preserved under injective language translation). 
Second, it is easy to see that o2,, (a1) and az refer to disjoint sets of vari- 
ables; therefore (and since the only free variables allowed in initialisation 
conditions are the ones corresponding to channels), the safisfiability of 
the conjunction o2,, (a1) A a2 is guaranteed. 

e Let us now prove that apr F Initp, = 01; 02(Initp,) A ag. We know 
that apr F Initp, & o2(Initp,) A a2, and that Papr F Initp, = 
oi(Initp,) ^ ar. Combining these two hypotheses, we straightforwardly 
get that Bapr F Initp, + o2(01(\Initp,) ^ a1) A ag, which leads us to 
@apt F Initp, © (2(a1 (Init p, )) A 02(a1)) A Q2), as we wanted. 

— We have to prove that for every g € I} such that 02,4; 014. (9) is defined, and 
for every v € V1, if o1,,,302,,(v) E€ D3(g), then v € Di(o1,.302,.(g)). This 
is straightforward, thanks to our hypotheses regarding frame preservation of 
morphisms gı and o2. 

— To prove that the composition of the morphisms cı and o2 weakens both 
the lower and the upper bound guards is also straightforward. 

— We have as hypotheses that: 

e for every g € I> such that o1,,(g) is defined, there exists a formula a1 
whose referring primed variables are contained in (Vj — 01,,,(Vi)’) such 
that: Bapr F Avy: a1 (U7) and ®apr F o(L1(01,.(9))) => (Ra(g) < 
a (Rı (Tra. (9))) A Q1), 

e for every g € I3 such that o2,,(g) is defined, there exists a formula ag 
whose referring primed variables are contained in (V3 — 02,,,(V2)’) such 
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that: Bapr F Joz : a2(Tz) and Papr F a(Le(o2,.(g))) > (Rs(g) € 

72(R2(o2,.(9))) A a2). 
Let g € I3 such that o2,.;01,.(g) is defined. We have to find a formula 
a3 whose referring primed variables are contained in (V3 — 01,,,;02,,(Vi)’) 
such that: apr F 303 : az(vz) and apr F a(Li(o2,.301,.(9))) > 
(R3(g) & 013 02(Ri(02,.;01,.(9))) A a3). We propose a3=02,,, (a1) A a2. 
The justification of the “satisfiability” of a3 is justified, as for the case 
of the initialisation, by the fact that both o2(a,) and ag are “satisfi- 
able”, and they refer to disjoint sets of variables. Proving that Apr F 
O(L1 (02,25 Flac (9))) > (Rag) & 01; o2(R1 (0240; Flac (9))) A Q3) is also sim- 
ple; having in mind that o(L1(02,.;01,.(g))) is stronger than a(Lə(02,.(9))) 
and L3(g), we can “expand” R3(g) into o2(R2(o2,.(g))) ^ a2), and this into 
(01; 02(Ri(02,.3 01a (9)) A a1) A Q2), obtaining what we wanted. 








The rationale behind the definition of extension morphisms is the character- 
isation of the substitutability principle (a property that can be shown to fail for 
invasive superposition as defined in [6]). The following result shows that, if there 
exists an extension morphism o between two designs P, and P> (and this exten- 
sion is realisable), then all behaviours exhibited by P, are also exhibited by P2. 
Since superposition morphisms, used as a representation of “clientship” (strictly, 
the existence of a superposition morphism between two designs indicates that 
the first is part of the second, as a component is part of a system when the first 
is used by the system), restrict the behaviours of superposed components, it is 
guaranteed that all behaviours exhibited by a component when this becomes 
part of a system will also be exhibited by an extension of this component, if 
replaced by the first one in the system. Of course, one can also obtain more be- 
haviours, resulting from the explicit use of new actions of the component. But if 
none of the new actions are used, then the extended component behaves exactly 
as the original one. 


Theorem 2. Let P, and P} be two CommUnity designs, and o : Py > P> 
an extension morphism between these designs. Then, every run of P) can be 
embedded in a corresponding run of P2. 


Proof. For this theorem, we consider a semantics based on runs, i.e., infinite 
sequences of interpretations such that they all coincide on the interpretation for 
ADT, the first interpretation in the sequence satisfies the initial condition and 
any pair of consecutive interpretations in the sequence either only differ in the 
interpretation of input variables (stuttering), or they are in the “‘consequence” 
relation for one of the actions of the component. 

Let P1 and Pz be two CommUnity designs, and o : Pi — P> an extension 
morphism between these designs. Let s = so, $1, 82,... be a run for Pı. We will 
inductively construct a sequence s’ = sh, $4,89,... which is a run for P2, and 
such that, for all 7, (8))\o.,(v4) = Si, i.e., the reduct of each s; to the symbols 
originating in Pı coincides with the interpretation s;. 


— Base case. The initialisation of P is of the form o(Initp,) A a, with a a 
formula satisfying apr + 3d: a(v), and whose variables are “new vari- 
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ables”, in the sense that they differ from those appearing in the initiali- 
sation of Pı. Then, there exists an interpretation J, of the variables in a 
that makes it true. We define sj as the extension of the interpretation so, 
appropriately translated via ø, with the interpretation Ia for the remaining 
variables. Clearly, this interpretation satisfies the initial condition of P2, and 
its reduct to the language of P, coincides with so. 

— Inductive step. Assuming that we have already constructed a prefix s = 
89, 51, 59,---, S; of the run s’, we build the interpretation s/,, in the following 
way. We know that s;+1 is in one of the following two cases: 

e s;+1 is reached from s; via stuttering. In such a case, we define sj, ,=s/, 
and clearly, by inductive hypothesis, we have that the reduct of sj, to 
the variables of P, coincides with s;+1, and (sj, s;,,) are in the “stut- 
tering relationship” . 

e there exists some action g € I such that (s;, $;41) are in the consequence 
relationship corresponding to g. If this is the case, notice that, under the 
“stronger guard” L(g), the assignment of an action g2 in oz (g) (which 
is nonempty, since Gac is surjective) is of the form o(Ri(cac(g))) Aq), for 
a formula a referring only to the primed versions of new variables. Since 
we know that ®4pr7 F 3d: a(d), there exists an interpretation I, of the 
variables in a that makes is true. We define s/,, as the extension of the 
interpretation s;,1, with symbols appropriately translated via ø, with 
Ia for the interpretation of the remaining variables. It is straightforward 
to see that (s/,s,,,) are in the consequence relation of g2, and that the 
reduct of s; to the variables originating in P, coincides with s;41. 





3.1 Replacing Components by Extensions in Configurations 


The intention of extension morphisms is to characterise component extension, 
respecting the substitutability principle. One might then expect that, if a com- 
ponent C can be “plugged” into an architecture of components, then we should 
be able to plug an extension C” in the architecture, instead of C. Due to the re- 
strictions for valid extension, it can be guaranteed that a design in a well formed 
diagram can be replaced with an extension of it, preserving the wellformedness 
f the diagram (although it is necessary to consider an “open system” semantics, 
since extensions might introduce new input variables, which would be “discon- 
nected” after the replacement). Moreover, we can also prove that the colimit 
f the new diagram (where a component was replaced by an extension of it) is 
ctually an extension of the colimit of the original diagram. This basically means 
that the joint behaviour of the original system is augmented by the extension 
of a component, but never restricted (i.e., the resulting system exhibits all the 
behaviours of the original one, and normally also more behaviours). 

We are not in a similar situation when combining extension and refinement. 
As we mentioned, refinement plays an important role in the implementation 
of higher-order connectors, since it allows us to “instantiate” roles with actual 
components. Roles, as the Abstract_Sender example, specify the minimum re- 
quirements that have to be satisfied in order to be able to plug components using 
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a higher-order connector. Notice that, in particular, the interval determined by 
the guards of actions of the role has to be preserved or reduced by the actual 
parameter, i.e., the component with which the role is instantiated. Consider, for 
instance, the case of the Abstract_Sender. As we mentioned, the design Sender 
refines this more abstract Abstract_Sender, and therefore we can instantiate 
the abstract sender with the concrete one. Moreover, RES_Sender, an extension 
of Sender, also refines Abstract_Sender, so it also can instantiate this role. 
However, since extensions weaken both guards, it is not difficult to realise that, 
if a component B refines a component A, and B’ is an extension of B, then it 
is not necessarily the case that B’ also refines A. With respect to configurations 
of systems, this means that, when replacing components by corresponding ex- 
tensions, one might lose the possibility of applying or using some higher-order 
connectors. 

Although this might seem an unwanted restriction, it is actually rather nat- 
ural. The conditions imposed by roles of a higher-order connector are a kind 
of “rely-guarantee” assumptions. When extending a component we might lose 
some properties the role requires for the component. 


4 An Example Using Extension 


Let us go back to our example of communicating components via an unreli- 
able channel. As we explained in previous sections, we would like to superpose 
behaviour on the existing architecture, to make the communication reliable by 
implementing a reset in the communication when packets are lost. The mecha- 
nism we used was very simple, and required a “reset” operation on the sender, 
which, as we discussed, can be achieved by component extension. In order to 
complete the enhanced architecture to implement the reset acknowledgement 
mechanism, we need a monitor that, if it detects a missing packet, issues a call 
for reset. The idea is that, if a message is not what the monitor expected (char- 
acterised by the msg-exp), then it will go to a “reset” cycle, and wait to see if 
the expected packet arrives. If the expected packet arrives, then the component 
will start waiting for the next packet. Notice that, for the sake of simplicity, we 
assume that the communication between the monitor and the extended sender 
is reliable. The monitor used for this task is shown in Figure [I0] The final ar- 
chitecture for the system is shown in Figure [I] 

Notice that, since the superposed monitor is spectative, we can guarantee 
that, if the augmented system works without the need for reset in the commu- 
nication, i.e., no messages are lost, then its behaviour is exactly the same as the 
one of the original architecture with unreliable communication. 


5 Related Work 


The original work on CommUnity took its inspiration from languages like Unity 
[3] and IP [8] and on related software engineering research [I2] using superim- 
position/superposition as structuring principles. Recently, research by Katz and 
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Design RES_Monitor 


in 
msg: int 
out 
msg-rst: int 
prv 
msg-exp: int 
w: bool 
init 
msg-rst=0 ^ msg-exp=0 ^ w=true 
do 


rec1[msg-exp]: w ^A msg-exp=msg — msg-exp’=msg-exp+1 
rec2[msg-exp,msg-rst,w]: w A msg-expźmsg — 

msg-exp’=msg-exp ^ msg-rst’=msg-exp ^ w’=false 
rec3[msg-exp]: =w — msg-exp’=msg-exp 
res[w]: =w — w’=true 


Fig. 10. A monitor for detecting lost packets. 


his collaborators has recognised the usefulness of superimposition as a way of 
characterising aspects [[3[11[22]. Especially in [II], there is a recognition of the 
same principles we espouse in this paper, namely that aspects should be charac- 
terised and applied at the architectural level of software development. Aspects 
are seen as patterns to be applied to underlying architectures (which may already 
have been modified by the application of previous concerns), based on specifica- 
tions of the aspects. These specifications include descriptions of components and 
connectors used to define the aspect, as well as “dummy” components defining 
required services in order to be able to apply the aspect. The relationships and 
structuring mechanisms and the instantiation of the “dummy” components are 
explained in terms of superimpositions. 


The motivation for our research is very similar, we want to lift the treatment 
of aspects to the architectural level and view the application of aspects to the 
design of some underlying system as the application of a transformation defined 
by the aspect design to the underlying architecture, resulting in an augmented 
architecture. The application of various aspects can be seen as the application of 
a sequence of transformations to the underlying architecture (see [2]). This raises 
concerns analogous to those discussed in [II]. In order to develop this framework, 
we found it necessary to come to a better understanding of invasive superposi- 
tions in the context of CommUnity. In particular, we needed to characterise a 
structured use of invasive superpositions, which allows arbitrary changes break- 
ing encapsulation of the component being superimposed. As noted earlier, this 
problem has also arisen in the context of object oriented design and program- 
ming, resulting in the various proposals attempting to characterise the concept 
of substitutivity ({15]). We believe that this proposal provides a more solid foun- 
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Fig. 11. The architecture of the system, with the reset mechanism. 


dation for substitutivity, one that is better structured and more amenable to 
analysis. 

Of course, the work reported in [I6J17] is related to our work, both because 
it is based on CommUnity and because it recognises that the concept of higher 
order connector (a kind of parameterised connector that can be applied to other 
connectors to obtain more sophisticated connectors) can be used to characterise 
certain aspects. Again the emphasis is on using the specification of the aspect, as 
a higher order connector, to transform an existing architectural pattern in order 
to apply the aspect. As we demonstrate in this paper, some interesting aspects 
cannot be characterised in terms of this mechanism alone and it is necessary to 
consider transformations that apply uniformly to connectors and to the compo- 
nents they connect. Furthermore, some of the transformations require the use 
of invasive superpositions, as in the main example used in this paper. This is a 
subject that has received very little scrutiny in the CommUnity literature. 


6 Conclusion 


We have studied a special kind of invasive superposition for the characterisation 
of extensions between designs in the CommUnity architecture design language. 
This kind of morphism, that we have defined with special concern regarding 
the substitutability principle (an essential property associated with sound 
component extension), allows us to complement the refinement and (regulative) 
superposition morphisms of CommUnity, and obtain a suitable formal frame- 
work to characterise certain aspects, in the sense of aspect oriented software de- 
velopment. We have argued that some useful aspects require extensions on the 
components, as well as in the connectors, and therefore the introduced extension 
morphisms are necessary. Also, having the possibility of extending components 
provides us a way of balancing the distribution of augmented behaviour in the 


192 Nazareno Aguirre, Tom Maibaum, and Paulo Alencar 


connectors and the components, which would otherwise be put exclusively on 
the connector side (typically by means of higher-order connectors). 

We illustrated the need for extension morphisms by means of a simple case 
study based on the communication of two components via an unreliable channel. 
We then augmented the behaviour of this original system with a fault tolerance 
aspect for making the communication reliable, which required the extension of 
components, as well as the use of higher-order connectors. This small case study 
also allowed us to illustrate the relationships and combined use of extension, 
superposition and refinement morphisms. 

As we mentioned before, This problem has also arisen in the context of ob- 
ject oriented design and programming, attempting to define various forms of 
inheritance, resulting in the proposals attempting to characterise the concept 
of substitutability [[5[21]. We believe that this proposal provides a more solid 
foundation for substitutivity, one that is better structured and more amenable 
to analysis. The definition of extension in CommUnity that we introduced has 
been partly motivated by the definitions and proof obligations used to define 
the structuring mechanisms in B [J4], that justifies the notion of substitutivity 
and provides a structuring principle for augmenting components by breaking the 
encapsulation of the component. 
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Abstract. The concept of formal islands allows adding to existing pro- 
gramming languages, formal features that can be compiled later on into 
the host language itself, therefore inducing no dependency on the for- 
mal language. We illustrate this approach with the TOM system that 
provides matching, normalization and strategic rewriting, and we give a 
formal island implementation for the simulation of a chemical reactor. 


1 Introduction 


Concerned by the crucial need for improvement of existing software in their 
logic, algorithmic, security and maintenance qualities, formal methods are more 
and more used in the software design process. Usually they come into play both 
at the design and verification levels either for formal specification or high-level 
programming. But this approach does not take into account existing software, 
while billions of code lines are executed every day. This might be one of the 
reasons why formal methods did not yet fully succeed at the industrial level. 

Among many formal method approaches, algebraic techniques providing a 
clear semantics for signatures and rewrite rules are used in high-level languages 
and environments like ASF+SDF [25], Maude [9], CafeOBJ [II], or ELAN [618] 
which have been designed on these concepts. These rule-based systems have 
gained considerable interest with the development of efficient compilers. How- 
ever, when programs are developed in these languages, they can hardly interact 
with programs written in another language like C or Java. 

The work presented here proposes an alternative reconciling the use of alge- 
braic formal features with the widely used object-oriented language Java. This is 
possible through the Formal Islands approach developed in the Protheo project 
team since a few years [21]. A formal island is a piece of code introducing formal 
features. These new features are anchored in terms of the available function- 
alities of the host language. Once compiled, these features are translated into 
pure host language constructs preserving the behavior of the program. The for- 
mal island concept is implemented through the software system TOM [3] which 
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is built upon the concepts of rules and strategic rewriting. TOM is a good lan- 
guage for programming by pattern-matching, and it is particularly well-suited for 
programming various transformations on trees/terms or XML data-structures. 
Moreover, its compiler has been designed with the TOM language. 


The approach and the use of TOM are illustrated in this paper with a specific 
example: we apply strategic rewriting to model a chemical reactor by means of a 
formal island implementation. The considered problem is the automated gener- 
ation of reaction mechanisms: a set of molecules and a list of generic elementary 
reactions (reaction patterns) are given as input to a generator that produces the 
list of all possible elementary reactions according to a specific reactor dynamics. 
The solution of this problem consists of generating all possible reactions and 
collecting all products starting from a small set of reactants. We are therefore 
interested only in the qualitative aspects of this problem. 


A number of software systems have been developed for the auto- 
mated generation of reaction mechanisms [1024]. As far as literature says, these 
systems are implemented using traditional programming languages, employing 
rather ad-hoc data structures and procedures for the representation and transfor- 
mations of molecules (e.g. Boolean adjacency matrices and matrices transforma- 
tions). Furthermore, existing systems are limited, sometimes by their implemen- 
tation technology, to acyclic species, or mono-cyclic species, whereas combustion 
mechanisms often involve aromatic species, which are polycyclic. 


In the GasEl project we already have explored the use of rule-based 
systems and strategies for the problem of automated generation of kinetics mech- 
anisms [2410] in the whole context of its use by chemists and industrial part- 
ners. In GasEl the representation of chemical species uses the notion of molecular 
graphs, encoded by a term structure called GasEl term [7] which is inspired by the 
linear notation SMILES [31]. The graph isomorphism test is based on the Unique 
SMILES algorithm which provides a unique notation for each chemical struc- 
ture regardless of the many possible equivalent description of the structure that 
might be input; the order of this algorithm is N?log2N where N is the number 
of atoms in the structure. Reactions patterns are encoded by a set of conditional 
rewriting rules on GasEl terms. The molecular graph rewriting relation is simu- 
lated by a rewriting relation on equivalence classes of terms [8]. The control of 
the chemical reactions chaining (i.e. the reactor dynamics) is described using a 
strategy language [7]. GasEl prototype is implemented in ELAN [6[18], encoding 
a set of nine reaction patterns. Qualitative validations have been performed with 
chemists [I6]. 


The formal background of strategic rewriting is quite relevant for the consid- 
ered problem: (i) chemical reactions are naturally expressed by chemists them- 
selves using conditional rules; (ii) matching power associated with rewriting al- 
lows retrieving patterns in chemical species; (iii) defining the control on rules is 
essential for designing automated mechanisms generators in a flexible way and 
for controlling combinatorial explosion. This gives the possibility to the chemist 
of activating and deactivating reactions patterns, and of tuning their applica- 
tion during each stage. The main technical difficulty with ELAN implementation 
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consisted in the encoding of reaction patterns on GasEl terms that correctly 
simulates the corresponding transformation on molecular graphs. The TOM im- 
plementation provides another approach to this problem, while keeping the same 
molecular graph rewriting relation, and preserving the same chemical principles 
and hypotheses as in GasEI. 


The paper is structured as follows. Section [2] presents the formal island con- 
cept that will be further illustrated in the sequel. The TOM system is briefly 
described in SectionBJand the main language constructions needed to understand 
the considered application are introduced. Section [4]is devoted to the chemical 
example and explains what kind of reactor is modelled. Section [5] addresses the 
formal island implementation of the chemical reactor and details the different 
steps performed to achieve the Java implementation. Finally Section [6] draws 
some conclusions and perspectives for future work. 


2 Formal Islands 


Since several years, we have been strongly concerned with the feasibility of strate- 
gic rewriting as a practical programming paradigm [ITIS]. The development of 
efficient compilation concepts and techniques took an important place in the 
language support design. The results presented in [20] led to a quite efficient 
implementation and thus demonstrated the practicality of the paradigm. 

Making strategic rewriting easily available in many programming languages 
was the main concern that led to the emergence of formal island. This concept 
provides a general way to make formal methods, and in particular matching and 
rewriting, widely available. 

We use the notions of formal island and anchoring to extend an existing 
language with formal capabilities. A formal island is a piece of code introducing 
formal features, while anchoring means to describe these new features in terms of 
the available functionalities of the host language. Once compiled, these features 
are translated into pure host language constructs, allowing us to say that the 
formal islands are not intrusive with respect to the behavior of the application. 

In the following we review the definitions of representation functions and 
formal anchor for the unsorted case. 

In order to precisely define these notions, we recall a few concepts of first 
order term algebra needed here [I7]. A signature F is a set of function symbols, 
each one associated to a natural number by the arity function, ar: F — N. Fy 
is the set of function symbols of arity n, Fn = {f E F | ar(f) =n}. T(F, &) is 
the set of terms from a given finite set F of function symbols and a denumerable 
set X of variable symbols. A position within a term is represented as a sequence 
w of positive integers describing the path from the root of the term to the root 
of the subterm at that position, denoted by tjw- Symb(t) is a partial function 
from T (F, X) to F which associates to each term t its root symbol f € F. The 
set of variables occurring in a term t is denoted by Var(t). If Var(t) is empty, 
t is called a ground term and T(F) is the set of ground terms. We write tı = te 
when tı and tə are syntactically equal. 
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Definition 1. ({19]) Given a tuple composed of a signature F, a set of variables 
X, booleans B and integers N, given sets of host language constructs NF, Qx, 
NT, Qg, and Ny, we consider a family of representation functions "1 that map: 














— function symbols f € F to elements of Qf, denoted" f’, 
variables v E€ X to elements of Rx, denoted" v”, 

ground terms t E€ T(F) to elements of Nr, denoted Tt”, 
— booleans b € B= {T, L} to elements of Ng, denoted b", 
natural numbers n € N to elements of Qu, denoted "n". 


| 


| 


| 


Definition 2. ({19]) Given a tuple (F,%,T(F),B,N) and the operations eq: 
Nr x Nr — Neg, isfsym: Nr x NF — g, and subtermy: Nr x Ry > Qr 
(f € F), a representation function "7 is a formal anchor if it preserves the 
structural properties of T (F) in"T(F)" by the semantics of eq, is_fsym, and 
subtermş: 

V titi,te ET(F),Yf € F,Vi € [L..ar(f)] : 

eq("t1", Tta?) = Tt = ty! 
isfsym("t), f 1) ="Symb(t) = f? 
subterm s (Tt, i) = tu" if Symb(t) = f 


We illustrate the concept of formal anchor with a small example from [I9]: 


Example 1. In C or Java like languages, the notation of term can be implemented by 
a record (sym:integer, sub:array of term), where the first slot (sym) denotes the top 
symbol, and the second slot (sub) corresponds to the subterms. It is easy to check 
that the following definitions of eq, is_fsym, and subtermy (where = denotes an atomic 
equality) provide a formal anchor for 7 (F): 


eq(ti,t2) = ti.sym = te.sym A Vi € [l..ar(ti.sym)], 
eq(t1.subli], t2.subfi]) 
isfsym(t, f) £ t.sym = f 
subterm;(t,2)  t.subļi] if t.sym = f Ai € [l..ar(f)] 


3 TOM 


TOM is an implementation of the idea of formal island PI]. TOM [8] provides 
matching, normalization, and strategic rewriting in Java, C, and Caml [21[15). 
In particular, we have used Java for developing the chemical application de- 
scribed in this paper. In each of the three instances, matching and rewriting 
primitives can be combined with constructs of the programming language, then 
compiled to the host language, using similar techniques as for compiling ELAN. 
The normal forms provided by rewriting are available to get conciseness and ex- 
pressiveness in programs written in the host language. Moreover one can prove 
that these sets of rewrite rules have useful properties like termination or con- 
fluence. Once the programmer has used rewriting to specify functionalities and 
to prove properties, the compilation dissolves this formal island in the existing 
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code. The TOM constructs are non-intrusive because their use induces no depen- 
dence: once compiled, a TOM program contains no more trace of the rewriting 
and matching statements that were used to build it. 

Basically, a TOM program is a list of blocks, where each block is either a TOM 
construct, or a sequence of characters. The idea is that after transformation, 
the sequence of characters merged with the compiled TOM constructs becomes 
a valid host language program having the same behavior as the initial TOM 
program. 

The main construct, Zmatch, is similar to the match primitive found in func- 
tional languages: given an object (called subject) and a list of patterns-actions, 
the match primitive selects the first pattern that matches the subject and per- 
forms the associated action. The subject against which we match can be any 
object, but in practice, this object is usually a tree-based data-structure, also 
called a term in the algebraic programming community. The match construct 
may be seen as an extension of the classical switch/case construct. The main 
difference is that the discrimination occurs on a term and not on atomic values 
like characters or integers: the patterns are used to discriminate and retrieve 
information from an algebraic data structure. 

In addition to %4match TOM provides the %rule construct which allows de- 
scribing rewrite rule systems. This construct supports conditional rewrite rules 
as well as rules with matching conditions (as in ELAN or ASF+SDF). By default, 
TOM rules provide a leftmost innermost normalization strategy which computes 
normal forms in an efficient way. It is of course possible to combine these features 
with more complex strategies, like generic traversal strategies, to describe more 
complex or generic transformations. When understanding all the possibilities of- 
fered by TOM, this general purpose system becomes as powerful and expressive 
as many specific rewrite rule based programming languages. 

Another construct of TOM is the backquote (‘). This construct is used for 
building an algebraic term or to retrieve the value of a TOM variable (a variable 
instantiated by pattern-matching). 

The %vas construct allows the user to define a many-sorted signature. This 
construct is replaced at compile time by the content of the generated formal 
anchor. 

Other available constructs like ;typeterm, ,typelist, and Zop which define 
the formal anchor between signature formalism and concrete implementations 
(Java classes) allow performing pattern matching against any data structure. 

In order to make easier the use of TOM, two tools were developed: ApiGen 
and Vas [3]. ApiGen is a system which takes a many-sorted signature as input, 
and generates both a concrete implementation for the abstract data-type (for 
example Java classes), and a formal anchor for TOM. Vas is a preprocessor for 
ApiGen which provides a human-readable syntax definition formalism inspired 
from SDF. These two systems are useful for manipulating Abstract Syntax Trees 
since they offer an efficient implementation based on ATerms [26] which supports 
maximal memory sharing, strong static typing, as well as parsers and pretty- 
printers. 
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TOM provides a library inspired by ELAN, Stratego [27], and JJTraveler [29], 
which allows us to easily define various kinds of traversal strategies. Figure 
provides an algebraic view of elementary strategy constructors, and defines their 
evaluation using the application operator @. We note that, according to the 
definition, if cis a constant operator then All(s)@(c) returns c, while One(s)@(c) 
returns failure. In this context, the application of a strategy to a term can fail. 
In Java, the failure is implemented by an exception (VisitFailure). 


IdentityQ(t) =>t 

FailQ(t) => failure 

Sequence(s1, 82) Q(t) => failure if s1Q(t) fails 
s2Q@(t’) if s;@(t) => t’ 

Choice(s1, 82) Q(t) =>’ if s,@(t) => 
s2Q(t) if s1Q(t) fails 


All(s)@(f (t1, ...; tr)) => f (testa) if s@(t) => t,..., s@(tn) => th 

failure if there exists i such that s;Q@(t;) fails 
One(s)Q(f (ti, ...,tn)) => f(ti,..,t,-.,tn) if s@(t) => t; 

failure if sQ@(t1) fails, ..., s@(tn) fails 
Omega(i, s)@(f(t1,...,tn)) => f(t, -tis -3 tn) if sQ(t:) => t; 

failure if sQ(t,) fails 





Fig. 1. Strategy constructors 


These strategy constructors are the key-component that can be used to define 
more complex strategies. In order to define recursive strategies, the u abstractor 
was introduced. This allows giving a name to the current strategy, which can be 
referenced later. Using strategy operators and the u abstractor, new strategies 
can be defined as illustrated by Figure 2] 


Try(s) = Choice(s, Identity) 
Repeat(s) = px.Choice(Sequence(s, x), Identity()) 
BottomUp(s) = ux.Sequence(All(a), s)) 


TopDown(s) = pax.Sequence(s, All(x))) 
Innermost(s) = px.Sequence(All(x), Try(Sequence(s,x))) 





Fig. 2. Examples of strategies 


The Try strategy never fails: it tries to apply the strategy s; if it succeeds, the 
result is returned, otherwise, the Identity strategy is applied, and the subject is 
not modified. 

The Repeat strategy applies the strategy s as many times as possible, until 
a failure occurs. The last unfailing result is returned. 

The strategy BottomUp tries to apply the strategy s to all nodes, starting 
from the leaves. Note that the application of s should not fail, otherwise the 
whole strategy also fails. 
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The TopDown strategy tries to apply the strategy s to all nodes, starting 
from the root. It fails if the application of s fails at least once. 

The strategy Innermost tries to apply s as many times as possible, starting 
from the leaves. This construct is useful to compute normal forms. 


4 Strategic Rewriting for a Chemical Reactor 


The purpose of an automated generator of detailed kinetic mechanisms is to 
take as input one or more hydrocarbon molecules and the reaction conditions, 
and to give as output a reaction model, i.e. the list of applied reactions. We are 
interested only in exhaustive generation of chemical reactions, therefore we con- 
sider only the qualitative aspects of the model; the quantitative or probabilistic 
features are treated separately by the chemists. For this kind of modeling the 
two dimensional model of molecules is sufficient. 

In this section we present the model used for the representation of chemical 
species, the reaction pattern we considered, and the reactor dynamics. 


4.1 Molecular Graphs 


We now describe formally the chemical model we want to implement. 





Fig. 3. Molecular graphs 


A molecular graph [12] is a vertex-labelled and edge-labelled graph, where 
each vertex is labelled with an atom and each edge graphically suggests the bond 
type or is explicitly labelled with the bond type, as illustrated in Figure B] A 
chemical reaction is expressed as a rewriting rule for molecular graphs. Figure[4] 
gives an example of a chemical reaction. 


4.2 Rules for Decorated Labelled Graphs 


In the so-called primary mechanism, a set of nine reaction patterns is applied to 
an initial mixture of molecules. A complete description of the involved reaction 
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0+ O om CY 





Fig. 4. Bimolecular initiation for ethylbenzene 


patterns is out of the scope of this paper, but the chemistry-like presentation 
from Figure 5] gives the flavor of the transformations needed to be encoded. 


Name Description 





Z-y — re + ey 


| bi [O=O t H-r > OOH + ex 
| ipso [eH + Ar—a — H-Ar + or 


mp t Hr o F-H 








ETEF =y 











Fig. 5. Reaction patterns of primary mechanism: patterns involve simple (—) or 
double (=) bonds, free radicals (ex), specific atoms (O, H); variables x, y, z can 
be instantiated by any reactants 


Every reaction pattern is also guarded by “chemical filters” , i.e. chemical con- 
ditions of applications, not mentioned here, even if several of them are currently 
implemented: they include considerations on the number of atoms in involved 
molecules or free radicals, the type of radicals or the type of bonds, etc. Some 
of them are discussed in [IO]. 


4.3 Primary Mechanism 


The primary mechanism can be described as the result of three stages (see Figure 


[6): 


1. The initiation stage: unimolecular and bimolecular initiation reactions, (ui) 
and (bi), are applied to initial reactants, i.e. to the initial mixture of molecules. 
Let RS; = RS be the set of all reactants that can be obtained. 

2. The propagation stage: the set of reactions, (ipso), (me), (bs), (ox), and 
(co.O.), are applied to all reactants in RS; to obtain a new set RSj+1 of 
reactants. The reactants from RS; are then added to RS;+1. This step is 
iterated until no new reactant is generated. 
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3. The termination stage: combination and disproportionation reactions, (co) 
and (di), are applied to free radicals of RS; to get a set RS’ of molecules. 


INITIATION : PROPAGATION : TERMINATION 
: e i 


ui ‘ i ‘ co 


di 








Fig. 6. Primary mechanism 


The initial mixture of molecules, RS, is a finite set of reactants. Working 
only on the qualitative aspects of this chemical problem, we are not interested 
in the quantity or concentration of each reactant; hence, for each element in the 
current set of reactants we consider to have an infinite supply. 

The set of reaction rules R is partitioned into three sets R;, Rp, and R; where 
R; = {(ui), (bi)}, Rp = {(me), (ipso), (bs), (ox), (co.O.)}, and R; = {(co), (di)}. 

For expository reasons we consider here that all reactions have the generic 
form mı + mz —> m| + m4, where at most one reactant in each side of the rule 
can be a “dummy” reactant which is always present in the set of reactants. 


P(S) UNIT (R: P(R), Pi: PCS) [, Pz: P(S)]) 
begin 
P’ := Ó; 
while(—terminate()) do 
(mı, m2) := select(P; |, P2]); 
for all (mı + m2 > mi, +m) ER 





P’ := insert(P', mi, mh); 


fi 
od 
return P’ 
end 





Fig. 7. The UNIT algorithm 


The algorithms for the reactor dynamics for each stage have a common part, 
which we call UNIT (Figure[f), parametrized by a set of reaction rules and one or 
two input sets of reactants. select(P,) returns randomly each time a new pair of 
reactants from P, without removing them from P;. As expected, select(P,, P2) 
returns randomly a new pair of reactants, first from P, and the second from P», 
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without removing the reactants from the two sets. insert(P, m1, m3) adds the 
two reaction products m4 and m4 to P if they are not already in P. The function 
terminate() returns false as long as there are reactants that can interact by 
means of rules from R. 


P(S) Alglnit (Po : P(S)) P(S) AlgPropag (Po : P(S)) 
begin begin 
return Po U UNIT(Ri, Po) i := 0; 
end P” =; 
repeat 
P(S) AlgTermin (Po : P(S)) P" := P"UP;; 


begin Pita = UNIT(Rp, P”, P;) \ P”; 
return Po U UNIT(R:, Po) i:=i+ l; 
end until P; = 0; 
return P”; 
end 





Fig. 8. The stage algorithms 


Now the algorithms for the three stages can be written in a rather uniform 
way as given in Figure [8] 

We consider that the three stages of the reactor are executed sequentially due 
to chemical hypothesis. Therefore the reactor dynamics is described in Figure p} 


AlgTermin( AlgPropag(AlgInit(Po))) 


Fig. 9. The reactor dynamics 


5 A Formal Island Implementation of the Primary 
Mechanism 


We present in this section how the primary mechanism is implemented in TOM 
using the formal island principle. 
The TOM implementation involves four steps, in order to design: 


1. An algebraic view of molecular graphs, as a set of terms on a convenient 
signature. 

2. A representation mapping that establishes a correspondence between alge- 
braic terms and Java objects. (This is the formal anchor.) 

3. Reaction rules implemented with match constructs: the left-hand side con- 
sists of a TOM term, while the right-hand side is a mixture of Java code and 
TOM constructs. 
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4. Strategies for applying the reaction rules within each stage, and the chaining 
of stages. 


In the following subsections, we develop each of these steps. 


5.1 Molecular Graphs Viewed as Algebraic Terms 


A molecular graph (see Figure is encoded by a term, as proposed in the 
linear notation SMILES presented in [BI]. Representing graphs as terms is a 
choice design: terms provide an intermediate structure between graphs and their 
representation by adjacency lists which appears to be well suited to the patterns 
specific to our application. 

We briefly recall the principles of this representation. Molecules are repre- 
sented as hydrogen-suppressed molecular graphs (hydrogen atoms are not rep- 
resented) with atom-labelled vertices and bond-labelled edges. If the hydrogen- 
suppressed molecular graph has cycles, it can be transformed into a tree by 
applying the following rule to every cycle: arbitrarily choose one simple- or 
aromatic-labelled edge of the cycle, delete the edge, and add a fresh digit and the 
label of the edge to the labels of the formerly adjacent vertices. This corresponds 
to a spanning tree of the molecular graph. A vertex is chosen as root, and the 
tree is represented in a (semi)parenthesized preorder traversal (the parentheses 
are omitted for the right-most child of each vertex). Moreover, an aromatic cycle 
is represented by lower case letters, and the aromatic and simple bonds are not 
represented. 











Fig. 10. From a cyclic molecular graph to an acyclic decorated molecular graph 


In the first molecular graph from Figure [I0] two edges are transformed into 
implicit edges: (i) edge {6,11} labelled with simple is hidden and encoded by 
labels (1, simple) on vertices 6 and 11; (ii) edge {5,6} labelled with aromatic is 
hidden and encoded by labels (2, aromatic) on vertices 5 and 6. The aromaticity 
of a bond is propagated to its end vertices which are labelled by lower case 
letters in the SMILES notation, and by upper case letters prefixed by ar in the 
signature. For example, if the vertex number 1 is chosen as root, a linear notation 
is CCc(ccc12) cc2C=C01; if the root is the vertex number 3 another notation is 
C(CC) (ccc12) cc2C=C01. 
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The user must provide as input for this prototype a list of molecules in the 
SMILES notation. The associated TOM terms are quite heavy to handle, hence 
the user does not need to deal with them. Moreover, the user can use a Java 
based software provided by Chemaxon, called Marvin [2], which allows editing 
and visualizing molecules on a web page: by simply drawing a molecule, one gets 
its SMILES notation. 

The syntax for the TOM terms encoding decorated molecular graphs is given 
by Figure[I] The operator lab constructs a label composed of an integer and a 
bond type encoding an implicit edge, while the operator symb constructs a label 
for vertices composed of an atom name and a list of labels. 


sorts Atom Bond Label LabelList Symbol Reactant ReactantList 
abstract syntax 
Atom 
Atom 
Atom 
Atom 
Atom 
none Bond 
simple Bond 
double Bond 


triple Bond 

arom Bond 

lab(no:int, bond:Bond) -> Label 

concLab( Label* ) -> LabelList 

symb(atom:Atom, labels:LabelList) -> Symbol 

rct (bond: Bond, symbol:Symbol,rctList:ReactantList) -> Reactant 
conc( Reactant* ) -> ReactantList 





Fig. 11. The signature for TOM terms 


We represent a decorated molecular tree as a term of sort Reactant as follows: 


— a leaf v is a term of sort Reactant, 


rct(b, symb(a, concLab(labs*)), conc()) 


where a encodes the label of the leaf (an atom symbol), b encodes the label 
of the edge connecting v with his father, and labs* is a possibly empty list 
of pairs of integers and bond types representing the associated set of broken 
cycle labels; 

— an internal vertex is a term of sort Reactant, 


rct(b, symb(a, concLab(labs*)), conc(rcts*)) 


where rcts* encodes the list of its term-like represented children; 
— the root has a dummy bond label, none, for uniformity reasons. 
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Operation symbols like conc above represent variadic associative operators 
that construct a list from its arguments (that can be empty). 

We consider that a radical point is an atom of valence 1 labelled by e (for 
electron). For efficiency reasons, we consider all free radicals (such as ex in 
Figure [5) to have tree representations where the electron is the root. 

The signatures for GasEl terms and TOM terms are slightly different, but the 
principles for building the terms are the same. The differences rise from restrict- 
ing TOM signatures to many-sorted ones, while in ELAN one can use order-sorted 
signatures. The operation symbols in TOM are given in prefix notation and are 
always explicit. 


5.2 Mapping Construction 


In order to define necessary abstract data-types, we use the signature definition 
mechanism ({4typeterm, /;typelist, Zop, etc.) provided by TOM. 

For example, given a Java class Reactant, we can define the following alge- 
braic mapping for it: 


itypeterm Reactant { 
implement { Reactant } 


equals(t1, t2) { t1i.equals(t2) } 
} 





where the class Reactant has the following structure: 


class Reactant { 

private Bond bond; 

private Symbol symbol; 

private ArrayList rctlist; 

public Reactant(Bond bond, Symbol symbol, ArrayList rctlist) {...} 


J 





We can define the following constructor for the Reactant type: 


hop Reactant rct(bond:Bond, symbol:Symbol, radlist:ReactantList) { 
is_fsym(t) { t instanceof Reactant } 
get_slot(bond,t) { t.getBond() } 


get_slot(symbol,t) { t.getSymbol() } 
get_slot(rctlist,t) { t.getRctlist() } 
make(bond,symbol,radlist) { new Reactant(bond, symbol, radlist) } 
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In fact, this algebraic operation is a mapping from algebraic terms to Java 
objects that preserves the structural properties of Reactant sorted terms for 
Reactant Java instances, i.e. is a formal anchor. Let us remind that the formal 
anchor is determined by the semantics of three mappings: eq, is_fsym, subterm. 
The construct Ztypeterm contains the definition of eq which is equals. The 
other two mapping definitions are given by means of the Zop construct for the 
operation symbol rct: the mapping is-fsym(t, rct) is implemented by the con- 
struct is_fsym(t), while the mapping subterm(t,7) is implemented by three 
constructs get_slot for retrieving each of the three arguments of rct. 

Instead of explicitly building this mapping, we can use the two external tools 
developed together with TOM, Vas and ApiGen, to generate Java files imple- 
menting the signature. In this way, we take advantage of the ATerm library and 
the VisitableVisitor design pattern which are automatically implemented by the 
generated classes. The memory sharing is very important for the implementa- 
tion of reactants because the terms encoding them have in general many common 
subterms, while the Visitor pattern is necessary for doing term traversals. 

The construct %vas allows defining a Vas grammar in a .tom file: 


jvas { 

module data 
imports ... 
public 


sorts Atom Bond CLabel CLabelList Symbol Reactant ReactantList ... 
abstract syntax 





Considering the signature described by Figure [I] after running Vas , some 
standard directories are generated containing all classes that make up the API for 
the signature. At the root level, the directory contains several standard classes 
and the mapping for TOM (data.tom). The subdirectory types contains ab- 
stract base classes for each sort defined in the signature, and one subdirectory 
per sort that contains concrete classes for each operator of this co-arity. 

The TOM implementation uses a specialized version of the Visitor design 
pattern, the VisitableVisitor pattern, based on the visitor combinators concept 
introduced in [29] which allows composition and full tree traversal control. The 
basic visitor combinators are inspired by the strategy primitives of Stratego which 
are presented in Figure [I] (except the Omega strategy). The Java classes gener- 
ated for the algebraic operations defined within a Vas construct implement the 
Visitable interface. On one side, the built-in or user defined traversal strategies 
are visitable as algebraic terms; on the other side they define visit_Sort and 
visit_ValueSort_OperationSymbol methods necessary for visiting algebraic 
operations. 
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5.3 Reaction Rules 
The reaction rules have the form: 
r: til + te] > ti +t if cond 


(where the elements between square brackets are optional), and we implement 
them using a match construct according to the following schema: 


%match(Reactant subjectl |, Reactant subject2]) { 
ti[, t]>{ 
if(cond) return pair(o (t1), o(t2)); 


} 





where the argument of match is the term we want to rewrite (the reactant), and 
c is the substitution resulting from the matching process. Let us notice that only 
the implementations of termination rules have two reactants in their left-hand 
sides. 

For all types of reaction rules, we define a base class ChemicalRule which 
encloses the common features of all reaction rules. For each reaction application, 
we determine the reaction products and its degeneration (how many times the 
reaction can be applied in different parts of reactants with equal results). 

In GasEl one of the implementation difficulties was to have exhaustive appli- 
cation of a reaction rule on one or two reactants. Since the reaction rules are 
encoded in ELAN as named strategies which can be applied only at the top of 
a term, exhaustive application in GasEl is achieved by generating all tree-like 
visions of an acyclic decorated molecular graph (a vision is obtained by choosing 
a root on a spanning tree). 

In TOM this problem is handled in an elegant way by using the strategy 
Omega (Figure [I). Given a term t and a rewrite rule r : tı > t2, the Omega 
strategy provides the following features: 


— we can apply a topdown (or other traversal) strategy for solving the matching 
problem tı < t; successful matches give rise to a family of substitutions 
{oi Jien; 

— for each match solution i, the position w; in t where the pattern matched can 
be retrieved as a Java object by means of the static method getPosition() 
of the class MuTraveler; 

— for a position w;, the subterm tų, is returned by the method getSubterm(); 

— for a position w;, the term resulting from t after applying the rewriting rule 
r, i.e. o;(t) is computed using the method getReplace(o;(t2)). 


This is, up to our knowledge, an original feature that provides full control for 
applying a rewriting rule and allows a wide range of applications. In particular 
this is quite convenient for applying a reaction rule. 

From the implementation point of view, there are two classes of reaction rules: 
the first class consists of the reactions (ui), (bi), (me), and (ipso) corresponding 
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to an implementation by topdown traversal of a term in search for a reaction 
pattern, while the second class consists of the rest of the reactions for which 
the pattern (with the radical point) is always searched at the root. We illustrate 
these two types of implementation with the following two examples. 


Example 2. [Bimolecular initiation reaction] The generic reaction is: 
O=O + H-a«x — eOOH + ez 


and an application is illustrated in Figure [4] The result of applying the (bi) 
reaction rule on a term subject is implemented by means of the following code: 


if( !containsElectron(subject) && (nC(subject) > 1)) { 
VisitableVisitor birule = new BiRule(); 


‘TopDown (Try ((birule))).visit (subject) ; 
this.setResultList (birule.getResultList()); 
} 





First we test if the reactant does not contain a radical point (encoded as an 
electron), and if it contains at least two carbon atoms. If the test is successful, 
then we apply in a topdown manner a rule, instance of the class BiRule. 

For every subterm of sort Reactant, during the top-down traversal of the 
subject of the reaction, the following method of the object birule is applied: 


public Reactant visit_Reactant(Reactant arg) throws VisitFailure { 
Reactant ri, r2; 
int n; 
#match(Reactant arg) { 
ret(b, symb(C(), concLab(labs*)), conc(rcts*)) -> { 
n = nH(arg); 
if( n >= 1) { 
Position pos = MuTraveler.getPosition(this) ; 


ri = insertElectron(pos.getSubterm() .visit (globalSubject)) ; 
r2 = hangE(pos.getReplace(r1) .visit (globalSubject)) ; 
addMPack(‘mpack(n, pack(ctRcts.eoo, ctRcts.seoo), 

pack(r2, usmiles(r2)))); 


} 
} 
return ‘Fail().visit (arg) ; 


} 





The variable globalSubject is set to the value of the term participating to 
the reaction. We search within the term for a non-aromatic carbon atom which 
has at least one hydrogen bound by examining all subterms of sort Reactant. 
nH computes the number of hydrogen atoms connected to the C atom. 
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We attach an electron to the found carbon atom, we insert the new term in 
the context, and then we twist the term by means of hangE such that the node 
labelled by e becomes the root in the corresponding molecular tree in order to 
preserve the chosen representation of free radicals. 

A term of sort Pack represents a pair composed of a Reactant term and its 
SMILES form computed with the algorithm presented in [3I]; eoo is a constant 
term corresponding to eOO, while seoo is the canonical form of eoo; n is the 
degeneration of the reaction. The method addMPack adds an element consisting 
of a pair of Pack-sorted terms with the multiplicity n to a private list of this 
type; this list represents the result of the exhaustive application of a particular 
reaction rule. 


Example 3. [Beta-scission reaction with no cycle breaking] The generic reaction 
is: 
ex-YyY-z — L=y + oz 
This reaction rule described by subgraphs is easily translated in a rule over 
trees (as we can see schematically in Figure [I2) which is matched at the top of 
a term (because the electron is always placed in the root). 





Fig. 12. Beta-scission on terms 


5.4 Reactor Strategy 


We present in this section the implementation of the reactor dynamics formally 
described by the algorithms in Figures [8] and [9] We implement the function 
UNIT given in Figure [7] by means of the visitor class UnitRule with a private 
member consisting of an array of chemical rules: 


class UnitRule extends data.dataVisitableFwd { 
private Object rules[]; 
public UnitRule(Object rules[]) { 
super (‘Fail()); 


this.rules = rules; 
} 
public PairPackList visit_PairPackList(PairPackList arg) { ... } 
public PackList visit_PackList(PackList arg) { ... } 
} 
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UnitRule can be used as a rule with a particular behavior on terms of sorts 
PairPackList and PackList. Each of the visit_Sort methods contains appli- 
cations of the rules passed as arguments on lists of reactants. 

The initiation stage described by UNIT(R;) given by Figure [B] is imple- 
mented as follows: 


ChemicalRule initRules[] = 
{new UICCRule(), new UICHRule(), new BiRule()}; 


VisitableVisitor init_unit = new UnitRule(initRules) ; 
plist = ‘Try(init_unit) .visit (plist); 





where plist from the right-hand side is the input list of chemical reactants 
(the initial set of reactants), while plist from the left-hand side contains the 
products obtained from the initiation stage together with the input reactants. 


For the propagation stage, chemical hypotheses impose to apply the reac- 
tions (me) and (ipso) only on the products resulted from the initiation stage. 
Therefore we describe the propagation stage by means of the strategy 
UNIT(R,); repeat(UNIT(R, — {(me), (ipso)})), and we implement it as fol- 
lows: 


ChemicalRule propagRulesi[] = {new MeRule(), new IpsoRule(), 
new BSCCRule(), new BSCHRule(), new OxRule(), new CombeODeRule()}; 
VisitableVisitor propag_unit1 = new UnitRule(propagRules1) ; 
tmplist = ‘Try(propag_unit1) .visit (plist); 
tmplist = diff(tmplist, plist); 
plist = appendLists(plist, tmplist) ; 


pairlist = ‘pair(plist, tmplist) ; 

ChemicalRule propagRules2[] = {new BSCCRule(), new BSCHRule(), 
new OxRule(), new CombeQeRule() }; 

VisitableVisitor propag_unit2 = new UnitRule(propagRules2) ; 

pairlist = ‘RepeatId(Try(propag_unit2)).visit(pairlist) ; 

plist = getFirstList(pairlist) ; 





First we put the reaction products from all propagation rules in tmplist, 
then we select only the free radicals not already in the input list, and put them 
together with the initial reactants. We make a pair of lists with the first element 
consisting of all reactants, and the second element consisting of the list of new 
free radicals, and we provide it as input for the strategy that applies the chemical 
rules from the array propagRules2. The application of this strategy ends when 
the list of new free radicals is empty. The result of the propagation stage consists 
of the list of all products concatenated with the list of input reactants. 


The termination stage described by UNIT(R;) is implemented in TOM as 
follows: 
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ChemicalRule terminRules[] = {new CoRule(), new DiRule()}; 


VisitableVisitor termin_unit = new UnitRule(terminRules) ; 
plist = ‘Try(termin_unit) .visit (plist); 





For a given list of input molecules, this prototype writes in a file the chemical 
products for each stage as well as the elementary reactions that took place during 
the entire mechanism. 


6 Conclusion 


The first output of this work is a new prototype of a chemical reactor. First re- 
sults revealed good properties with respect to chemical validations of the model. 
In all but one cases, this prototype is faster (less than 13 seconds) than GasEl. 
Moreover, for non cyclic molecules with 16 carbon atoms and a big number of 
simple bonds, this implementation in TOM is up to 9 times faster than GasEl. 
The execution times for the two prototypes have been compared on all examples 
validated with chemists and presented in [16]. For the most complex molecule 
tested (JP10) not completely handled in [I6], the prototype in TOM was able to 
terminate with 1165 generated reactions in 139 minutes. A complete comparison 
between the GasEl prototype and the current implementation in TOM is not 
trivial due to notation and implementation differences, and out of the scope of 
this paper. 

It may be worth noticing that the rule-based approach on graph structures 
has also been studied in the modelling of signal transduction networks and 
metabolic pathways [23] in the domains of biological systems and protein inter- 
actions. Our model of chemical reactor seems to be easily adaptable to these 
domains. 

Our second concern was to explore the formal island concept and method- 
ology on a significant example. The objective of the formal island approach to 
extend the expressivity of the host language with higher-level constructs at de- 
sign time is well-illustrated in this example. From this point of view, the TOM 
implementation appeared to be quite convenient to implement chemical rules 
with conditions and actions expressed in the Java host language. On the other 
hand, control was expressed with a high-level language of strategies which makes 
now possible to reason about formal properties, especially the termination prop- 
erty of each phase [4]. This illustrates the idea to perform formal proof on the 
formal island constructions. 

A further idea would be to implement a new version of the TOM compiler 
able to perform graph rewriting. Representing cyclic structures in TOM is not 
too difficult but matching and rewriting have to be adapted to this context. 
Indeed this capability would open new application areas. 

A long-term objective of the formal island approach is to certify the imple- 
mentation of the formal island compilation into the host language. A first step 
in this direction has been presented in [19] to generate proof obligations for 
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the compilation of matching. A similar concern is underway for rewriting and 
strategies. 

Further improvements of the formal island approach is to anchor other lan- 
guage extensions, especially modules and parameters, while improving the ca- 
pacity of the compiler to generate verification requirements related to properties 
to be checked. 
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This work is dedicated to our colleague Joseph Goguen, who was extremely influential 
in the design of modern programming languages. 


1 Introduction 


Rigorous program development is notoriously difficult because it involves many as- 
pects, among which specification, programming, verification, code reuse, maintenance, 
and version management. Besides, these various tasks are interdependent, requiring go- 
ing back and forth between them. In this paper, we are interested in certain language 
features and in languages which help make the user’s life easier for developing pro- 
grams satisfying their specifications. 

Our interest focuses on three implemented specification/programming languages, 
OBJ [408], ML[27] and Coq [10], which have played an important historical role in 
the process of coming up with better languages. And indeed, both OBJ and ML had 
many successors or dialects, among which OBJ3 [20], Cafe-OBJ [28], Maude [9] and 
ELAN [2] for OBJ, and SML [23], CAML and OCaml among others for ML. 
Coq has evolved with many different versions keeping the same name, following the 
evolution of type theory from the calculus of constructions to the extended calculus 
of constructions and the development of the theory of inductive types from Martin- 
Lof’s type theory to the calculus of inductive constructions [12)31]. Other proof 
assistants based on a similar historical development include Lego [21], Alf and 
Agda/Alfa [I]. Cog remains the most mature and widely used of them all. 

We explain briefly in the introduction what important properties are shared by these 
three languages, and how OBJ has been influential in such a way that many important 
characteristics of ML and Coq were already present in OBJ, sometimes in disguise. 
In what sense can these three languages be considered as specification languages, or 
programming languages, or proof development systems is another important aspect we 
are interested in. 

The user does not like doing things twice. Writing a specification in one language 
before coding it in another language is more than a challenge: it is helpless. The cod- 
ing part must be automated as is the case in all three languages we are interested in. 
This automation obeys the same principle: forgetting the non-executable subpart of the 
specification or of its proof. 
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A specification is nothing but a logical property of the form V%.P(Z) — Q(T), 
where 7 is the vector of data, P(T) is the assumption, and Q(T) is the conclusion. 
Therefore, the specification/programming language must contain (possibly via an en- 
coding) a mechanism for expressing properties, as well as one for expressing compu- 
tations and, possibly, a last one for expressing proofs. In ML, the specification part is 
simple enough to be inferred automatically by the system from the user’s functional 
program: this is called type inference. The program then satisfies this (extremely poor) 
specification without requiring any further proof. In OBJ, specifications are algebraic, 
that is, conditional equations giving meaning to the various functions and predicates in- 
troduced by the user, and are executable via rewriting. Showing that the rewrite program 
implements the specification requires several checks (confluence and termination) left 
to the user. Proving properties of an OBJ specification can be done in the language itself 
by using reflexion, this has been done in Maude and Elan, as well as in CafeOBJ — to 
a limited extent. Coq uses higher-order intuitionistic logic as a specification language, 
and includes the possibility to carry out the development of a (constructive) proof of 
the specification by using a tactic language which generates a Coq term representing 
the proof. A functional program meeting the specification can then be extracted auto- 
matically from that proof by erasing all subepressions without computational content 
which differ from the others by their type. 


The old paradigm that the same program piece can be used several times in a bigger 
program with different data has led to a first notion of abstraction, giving rise to the 
notion of function, or subprogram. The idea that a program operating upon certain data 
should not depend upon the way they are actually represented has led to the notion of 
abstract data type. The same paradigm applied to groups of functions or subprograms 
achieving some well-defined task, processing some well-defined data, has led to the no- 
tion of module. Abstracting over modules themselves has lead to the notion of functor. 
All three languages have pioneered the design of modules and functors in their respec- 
tive areas, not to speak about abstract types, and OBJ has been very influential in this 
matter. 


Object-orientation is a different, important abstraction mechanism that is not part of 
our three languages, and indeed, the first two have been extended so as to include object- 
oriented features. We will not say more about object orientation, although OCaml, a 
dialect of ML, played an important role in popularizing object orientation among the 
community of functional programmers. 


Among the programming tasks that should be eased by a good language choice, 
only the last one, version management, is not taken care of at all by our three languages. 
Some of the others tasks are better taken care of by OBJ or by ML or by Coq. In partic- 
ular, the verification principles behind these languages differ in the expressivity of their 
underlying specification language. In OBJ, typing looks very elementary, since OBJ 
static types are checked in linear time by a bottom-up tree automaton. But OBJ types 
are not all static, requiring some runtime type-checking as well. In ML, static typing is 
more advanced, with a polymorphic type discipline for which types can be inferred by 
an exponential (but practically linear) algorithm. In Coq, types are arbitrary formulas of 
higher-order (intuitionistic) logic which can be checked in finite (but indefinite) time, 
and cannot be inferred in general. This typing system generalizes both OBJ’s and ML’s 


218 Jacek Chrzaszcz and Jean-Pierre Jouannaud 


typing as we will see. Verification can also be achieved by model checking or testing. 
Both are lacking in OBJ, ML and Coq, but can of course be made available as tactics in 
OBJ’s successors and Coq. 

The quest for the ideal programming language will continue until a satisfactory 
language is designed that internalizes features still taken care of by the user or by the 
programming environment. 


2 The Three Languages 


2.1 OBJ 


In their first landmark paper on CLEAR, Rod Burstall and Joseph Goguen introduced 
the brand new bright idea that specifying a program required a specific language able 
to reflect the structure of the problem itself [6]. Following the ADJ group [1615], they 
advocated for an algebraic specification language based on equationnal logic, together 
with a module system in which logical theories could be specified. This was the birth 
of CLEAR, later developped more formally in [7]. To our knowledge, CLEAR was the 
very first specification language. CLEAR was algebraic, using many-sorted algebras 
with error-sorts, an approach later revised to yield OBJ’s order-sorted algebras. CLEAR 
had parameterized modules and theories, but no functors and was not implemented, al- 
though one can consider that the first implementation of OBJ by Joseph Tardo [17], a 
student of Joseph Goguen at UCLA, was indeed an implementation of CLEAR. A sec- 
ond more advanced implementation was then written by David Plaisted when visiting 
Joseph Goguen at SRI in 1982, which included associative-commutative rewriting. 

OBJ2 was the third implementation of OBJ. It was developped in 1984, when Ko- 
kichi Futatsugi and the second author visited Joseph Goguen and and José Meseguer 
at SRI for one year. OBJ2 was the first algebraic specification language based on a 
fragment of a Horn logic built on the equality predicate and finitely many membership 
predicates called subsorts [14]. The many novel features of OBJ2 included a flexible 
user-defined syntax, defining subsorts by Horn sentences, rapid prototyping via rewrit- 
ing modulo associativity, commutativity, identity and their combinations, parameterized 
modules and functors. OBJ2 was followed by OBJ3 [20], an improved implementation 
developped by Claude and Héléne Kirchner whose postdoctoral visit closely followed 
their advisor’s. Full Horn logic is available in the Maude language [9[3], one of OBJ’s 
successors developped by José Meseguer and his collaborators. 

An OBJ program is a collection of modules followed by queries. A module is either 
an object or a theory. A module has a name, which we always write with capital let- 
ters. Objects are made of two parts: a signature made of basic types called sorts, and of 
constructors and (defined) operators for these sorts; the meaning of the operators and of 
the subsorts is given by (executable) Horn clauses (called equalities or sort constraints 
depending on the predicate heading the positive atom). We will also use the name of 
membership for sort constraint, as in Maude. In general, the principal sort of a module 
bears the same name as the module itself, but the first letter only is capitalized. Seman- 
tically, objects are initial algebras, implemented via the computation of normal forms: 
the meaning of the defined operators must be given by a convergent set of conditional 
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rewrite rules (possibly modulo associativity, commutativity and the like). A theory is 
much like an object except that it is not executable: its (loose) semantics is given by 
the class of all algebras that satisfy the arbitrary first-order logical sentences specifying 
its properties. The definition of an object or theory can use other objects or theories. 
The keywords: using allows to import a module without ensuring any property of the 
imported module which must therefore be copied; protecting ensures that the imported 
module is not modified, making copying unnecessary; extending stands in-between, 
since new values can be added in sorts, but old values cannot be made equal unless they 
were equal beforehand. Parameterization is one more way for importing a module. If T 
is a theory, parameterizing a module M by an abstract module X satisfying T will allow 
using the symbols defined in T in order to build M, possibly by using qualification as a 
disambiguation mechanism. The parameterized module M can later be instantiated by 
an actual A provided A satisfies the axioms of T. Asserting a module property is done 
by a view, which is the third kind of entity in OBJ. The construction of the instantiated 
module may also involve some copying. 

OBJ has a much more powerful mechanism for defining types than it appears. Be- 
sides its basic types called sorts, like IN and List, it also has type constructors: if the 
module LIST is parameterized by an abstract module X assumed to satisfy the theory 
T, then any type List(Elt) exists potentially, provided Elt is the sort of a module sat- 
isfying T. This allows to build the types List (IN) as well as List(List(IN)), therefore 
providing with some form of polymorphism. However, these types can only be used 
if the corresponding module instances LIST|N AT] and LIST|LIST|N AT]] are ex- 
plicitly constructed. The same mechanism provides with dependent types like bounded 
lists of length n, where n can be a parameter of sort IN defined via a theory. It also has 
arbitrary first-order Horn sentences as types, written t : s’ if A, where A is an arbitrary 
conjunction of equations and memberships built from the variables in t. OBJ’s subsort 
declaration is a static restriction of this mechanism. So, OBJ’s type system was quite 
strong at the time OBJ was implemented, and has even some Curry-Howard flavour. In 
retrospect, theories themselves can be seen as types for modules, and a view becomes 
then an assertion that a module has some theory as type. 

OBJ’s types, however, only serve specification purposes. Unlike modern functional 
programming languages like ML, typing is not really internalized in OBJ: property 
checking is left to the user’s responsibility. Still, a limited amount of type-checking is 
done. For example, the left-hand and right-hand side of an equality must have the same 
sort. And the expression occurring in the head of a membership must have a sort whose 
asserted sort must be a subsort. 

OBJ specifications are assumed to satisfy a few other properties, all left to the user. 
For example, the set of rules in a module is supposed terminating and confluent, and 
the operators should be completely defined. Maude provides support for checking these 
properties. 


2.2 ML 


ML was the first functional programming language in which specifications were given 
(actually, inferred) as types, another novel bright idea from the late seventies due to 
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Robin Milner [27]. ML has a powerful higher-order module system, an efficient exe- 
cution model via separate compilation, and a primitive verification mechanism via type 
inference. 

An ML program is a collection of modules. A module is either a structure, which 
corresponds to an OBJ non-parametric object, or a functor which corresponds to a para- 
metric object. Contrary to the latter, ML functors can be higher-order, i.e. they can be 
parametrized by a module which itself is parametrized. Specification of a functor pa- 
rameter is given by a module type. This can either be a signature, corresponding to an 
OBJ theory, or a functor type. Contrary to OBJ theories, values cannot be specified by 
equations, but types can. 

Another difference is the lack of views in the ML module system. Since subtyping is 
implicit, a functor F, expecting an argument of type SIG, can be applied to all modules 
M, whose principal module type MSIG is a subtype of SIG. Using type inference, the 
principal module type can be computed efficiently and since subtyping is an extension 
of inclusion, views are not necessary. On the other hand, the OBJ views can also be 
used to rename components of an object, which in ML can only be done via a functor. 

The important feature of OBJ that is missing in ML is theory extension via key- 
words extending and using. Because equational specification of values is lacking in 
ML, signature inclusion, present in most ML implementations, is much weaker than its 
OBJ counterpart, hence cannot be seen as a substitute. Indeed, theory extension can be 
used as another means of parametrisation: assume one declares a function f of some 
type in a theory A and one then uses it in a subsequent equational specification of some 
function g; in a theory B extending A, one can then provide equations defining f, there- 
fore completing the specifications of g at the same time. In fact, the specification of g is 
parametrized by f. Similar ideas are currently being investigate by the ML community 
with the so called mixins [4[19]. 


2.3 Coq 


In the mid-eighties, following the path initiated by Curry, Howard, Girard and De 
Bruijn, Thierry Coquand and Gérard Huet made another important step with the beau- 
tiful Calculus of Constructions [11], in which types are arbitrary sentences of higher- 
order intuitionistic logic. This calculus was the start of the language Coq, a proof as- 
sistant including a full functional programming language as an executable subset. Coq 
has a powerful higher-order module system with cut elimination semantics studied and 
implemented by the first author [8], at that time a phd-student of the second author, 
a primitive execution model via rewriting and an efficient execution model via com- 
pilation. It also includes a sophisticated proof search engine via tactics (and a tactic 
language), a secure proof checker based on type checking, and an extraction mecha- 
nism towards modular ML code. Here, it must be stressed that the module system is 
used to structure first specifications, then proofs, and finally the programs extracted 
from proofs. The latter is of course facilited by the fact that the module systems of Coq 
and ML are essentially the same. 

The logical formalism implemented in Coq is based on the calculus of inductive 
constructions [12]31]. The terms in Coq are of two sorts: calculable Set and logical 
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Prog] Values are typed by types, which are typed by the sort Set (for example 0 : nat 
and nat : Set). The second sort, Prop, is a type of logical formulas, which in turn are 
types of their proofs (formula, whose proof is e.g. fun x = x). In type theory with 
dependent types these two worlds interleave, but it is nevertheless possible to use this 
dichotomy in order to extract the computable content of a proof, by deleting all its 
(logical) subterms of sort Prop. 

The general structure of a Coq development is the same as that of an ML program. 
The main difference lies in logical parts: axioms in specifications and theorems in im- 
plementations. While in ML code precise specifications are usually written informally 
as comments and correctness is based on trusting the programmer, in Coq one can write 
specifications as logical formulas, and then carry out the proof that the specification is 
satisfied. 


3 Example 


To compare the modular features of the three languages, we shall study a simple sorting 
algorithm using an abstract priority queue. We also provide a naive implementation of 
the priority queue and show how the abstract algorithm can be composed with the given 
implementation. The obtained algorithm and data structure remain parameterized with 
respect to the element ordering, which can itself be instantiated later on. 

Priority queues are data structures implementing the following functionalities: cre- 
ation of an empty queue, insertion of an element into the queue and extraction of the 
minimal element from the queue. They can be realized very efficiently imperatively (Fi- 
bonacci heaps, binomial heaps, etc) but efficient functional implementations also exist 
(see e.g. B). 

Using a priority queue, one can implement the following sorting algorithm: insert all 
element into the queue and then extract them one by one. Several apparently different 
sorting algorithms can be seen as instances of this abstract schema using a particular 
implementation of a priority queue: selection sort uses unsorted lists, insertion sort uses 
sorted lists and heapsort uses heaps. 

This example, despite being so small and simple, illustrates quite well the modular 
features of our three languages and how they evolved from OBJ to ML and Coq. We 
show how a specification and an implementation of a data structure look like, how an 
implementation of the data structure can be composed with an abstract algorithm, and 
how the resulting concrete but parametric algorithm can be instantiated and used in a 
program. 

Our example shows the advantages of each approach: in OBJ one can write very 
concise equational specifications, in ML specifications are very brief (and imprecise) 
but implementations are very efficient, and Coq allows one to formally specify and 
prove correctness of a data structure or algorithm. The comparison between ML and 
Coq further shows how much work is needed to formally specify and verify a piece of 
code. 

We will give the actual code of the example in the presentation. 


3 There are other sorts in Coq, namely the predicative hierarchy of Type;, i € N, called uni- 
verses [22], but we do not use them in this paper. 
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4 Priority Queues in OBJ 


We will take the liberty to exploit the full power of Maude and use its syntax when 
appropriate, to ease the understanding. Using OBJ instead would sometimes require 
some irrelevant detour. 


Specification of an ordered type, pairs, queues and priority queues. 


We define successively trivial theories with a distinguished sort, pairs, totally ordered 
sets, queues and priority queues. Being part of any OBJ specification, the predefined 
module BOOL has one sort, Boo1, two (truth) values, true and false, and the usual 
Boolean connectives as operations. In all examples, italics are used to identify OBJ 
keywords. All sentences are terminated by a dot for parsing purposes. Underscores are 
used to indicate arguments of operators which use a mixfix syntax. 





th TRIV is 
sort Elt . 
endt 





The theory TRIV requires the existence of (at least) one sort, named Elt. 


obj PATR[X :: TRIV, Y :: TRIV] is 
sort Pair . 

op pair : Elt.X Elt.Y -> Pair. 
op 1st : Pair -> Elt.X. 

op 2nd: Pair -> Elt.Y. 











var E Elt.X . 
var EB’ : Elt.Y. 
eq i1st(pair(E, E’)) == 














m El 





eq 2nd(pair(E, E’)) == 
endo 





The parameterized object PAIR builds upon two formal objects X and Y satisfy- 
ing TRIV, which acts as a binder for the sort names Elt.X and E1t.Y, therefore 
providing for the polymorphic sort constructor pair. Note the use of qualification for 
disambiguating between the two instances of TRIV. The symbol == is used for equa- 
tions in theories and for rules in objects. It is also used for the built-in equality available 
at all sorts. Similarly, : sis the built-in membership predicate available at sort s. In 
the equations, the variables E, E’ and E’ ’ are universally quantified by the binding 
declaration var. 

































































th TOSET[X :: TRIV] is protecting BOOL . 

op — < -: Elt Elt -> Bool . 

var E E’ E’’ : Elt. 

E E < E == true. 

eq E == BE’ if E < E’ and BE’ < E. 

eq E < E’’ == true if E < E’ and E’ < E'’ 
eq E < E’ or BE’ < E == true. 











endt 
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The theory TOSET uses the module BOOL with the keyword protecting implying 
two important properties: no new element of sort Bool can exist in the semantics (for 
any two elements e, e’ of sort X,e<e’ must be equal to either true or false), and 
no two elements of sort Bool that were semantically different in BOOL can be equated 





























in TOSET. 

th QUEUE[X :: TRIV] is protecting BOOL . 
sorts NeQueue Queue . 

subsorts Elt < NeQueue < Queue 

op empty Queue . 

op get : NeQueue -> Elt 











op rest NeQueue -> Queue 
op insert Elt Queue -> NeQueue 
op eq : Queue Queue -> Bool 
var Q : NeQueue 
eq eq(empty, empty) == true 
eq eq(insert(E, Q), empty) == false 
eq eq(insert(E, Q), insert(E’, Q’) == 
(E == E’) and eq(Q, Q’) 
eq eq(insert(get(Q), rest(Q)), Q) == true 
endt 


var Q 


mb 


In the theory of queues, the declaration NeQueue < Queue implies that get and 
rest are total on their domain. An alternative is 


Q : Queue 














NeQueue 





























th PRIOQUE[X TRIV, Y :: POSET[X]] is extending 
PATR[X, QUEUE[X] ] 

op extract NeQueue -> Pair . 

op < Elt Queue -> Bool 

var Q NeQueue . 

var E, E’ : Elt. 

eq E < nil == true. 

eq E < insert(E’, Q) == E <.Y E’ and E < Q. 

eq extract(insert(E, Q)) == pair(E, Q) if E < Q 

eq extract(insert(E, Q)) == pair (1st (extract (Q)), 
insert (E, 2nd(extract(Q)))) if E < Q == false 

endt 


Note how models of PRIOQUE alternate loose interpretations (of TRIV, QUEUE and 


PRI 











LOQUE) with initial interpretations (of PAIR and BOOL). The role of the PATR is to 
provide a polymorphic pairing construct. 
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Specification of an abstract sorting algorithm based on priority queues. 


th 


sorts 


LIST[X :: TRIV] ts protecting BOOL . 
NeList List 





subsorts Elt < NeList < List 


op 
op 
op 
op 
var 
var 
eq 
eq 
mb 
endt 


th 


sorts 


nil : List 

-- : List List -> List [assoc id: nil]. 
head : NeList -> Elt 

tail : NeList -> List 




















EE’ : ELE 

LL’ : List 

head(E L) == E 

tail(E L) == L 

L L’ : NeList if L : NeList or L’ : NeList . 
ORDLIST[X :: TRIV, Y :: POSET[X], 

Z:: LIST[X]] is 





NeOList OList . 


subsorts NeOlist < OList < List 
subsorts NeOlist < NeList . 


op 
op 
var 
var 
eq 
eq 
eq 
mb 
mb 
eq 
eq 
endt 





sorted : List -> Bool 
sort : List -> OList 



































Tee Del GT Se TS... 

E EB’ : Elt 

sorted(nil) == true 

sorted(E) == true 

sorted(E E’ L) == E < E’ and sorted(E’ L) 
nil OList 

L : NeOList if sorted(L) and L : NeList . 
sort(L E L’ E’ L’’) == sort(L E’ L E L’’) 
sort(L) == L if sorted(L) 


Note the subtle use of associativity and identity of concatenation in specifying sort 
and sorted. 


obj 
op 
var 
eq 
eq 
endo 








SORT[X :: TRIV, Y :: POSET[X], Z :: PRIOQUE[X, Y]] is 
sort : Queue -> OList 
Q : NeQueue . 
sort (empty) == nil. 
sort(Q) == 1st(extract(Q)) sort(2nd(extract(Q))) 
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Concrete algorithms for sorting elements of an ordered set. 























view QLIST[X :: TRIV] of LIST[X] as QUEUE[X] 
sort Queue to List 

sort NeQueue to NeList 

op empty to nil 

op get to head . 

op rest to tail 

op insert to __. 

endvu 


This kind of typing assertion implies proof obligations to be checked by the user. 
Here, the equation given for insert, get and rest must be verified for their inter- 
pretation in LIST. We now construct specific priority queues as views to instantiate the 
abstract sorting algorithm. 

















view PRIOQUE1[X :: TRIV, Y :: POSET[X] ] of 
PAIR[X, QLIST[X]] as PRIOQUE[X, Y] 
var L L’ : Queue . 


var E : Elt. 

op extract(L E L’) to pair(E, L L’) 
if E < LandE < L’ 

op insert(E, L) to EL 

endv 














view PRIOQUE2 [X :: TRIV, Y :: POSET[X] ] of 

PAIR[X, ORDLIST[X, QLIST[X]]] as PRIOQUE[X, Y] 
var L : NeOList . 
var L’ : OList . 
var E EB’ E’’ : Elt. 
op extract(L) to pair(head(L), tail(L)) 
op insert : Elt List -> NeList 


















































eq insert(E, nil) == E. 

eq insert(E, E’) == EE’ if E < E’. 

eq insert(E, L E’ E’’ L’) == L E’ E B’!’ L’ 
if E’ < EandE < E'’ 











endv 


The module SORT[X, Y, PRIOQUE1[X, Y]] andthe module SORT[X, Y, 
PRIOQUE2[X, Y]] both inherit a sorting algorithm still parameterized by X, a set, 
and Y, an order on that set. Applying further to, for example, the built-in module NAT 
of natural numbers having the usual ordering on natural numbers, will generate objects 
in which we can run the obtained sorting algorithms. 








5 Priority Queues in ML 


The ML version of our example is given in the Caml dialect. It is divided into 
four parts: the definition of all needed signatures, a simple implementation of priority 
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queues as unsorted lists List PQ, an implementation of sorting by an abstract priority 
queue PQSort and composition of both implementations into a sorting module Sort. 

The first file contains the signatures of an ordered type (consisting of a type and an 
ordering function), a priority queue and a sorting algorithm. The latter two declare a 
submodule E defining the ordering. 





module type OrderedType = 
sig 
type t 
(* The type of elements *) 
val compare : t ~ t — int 
(* compare a bis smaller than 0 if a is smaller than b, 0 if a=b, and is 
larger than 0 if a is larger than b *) 
end 
module type PrioQueSig = 
sig 
module E : OrderedType 
(* The type and ordering of the elements of the queue *) 
type t 
(* The type of priority queues *) 
(* Operations: *) 
val create : t 
val insert Betts StS 8 
val extract : t — t * E.t 





(* raises Not_found if the queue is empty *) 
end 
module type SortSig = 
sig 
module E : OrderedType 
(* The type and ordering of the elements to sort *) 
val sort : E.t list — E.t list 
(* The sorting function *) 
end 





The second file contains the definition of a priority queue based on unordered lists. 
We skip the (straightforward) implementation here, the only interesting thing is the 
functor’s header: 


module ListPQ (O: OrderedType) 
PrioQueSig with module E=0 





which says that the module ListPQ is a functor, taking an order O as parameter and 
returning a priority queue where the ordering is the same as in O. Note that since the 
output signature of this functor is given, its users will only have access to types and 
functions specified in this signature. Other types and functions are treated as local and 
implementation specific and therefore they will be inaccessible. 
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The third element is the abstract algorithm, whose implementation is also trivial. 
Again the interesting part is the functor’s header, which can have two possible forms. 
The first one is the following: 


module PQSortl1 (O: OrderedType) 
(PQ: PrioQueSig with module E=0) 
SortSig with module E=0 


Now, in order to obtain the final sorting algorithm one can do it in OCaml in the 
following way: 


module Sort1 (O: OrderedType) 
SortSig with module E=0 
= PQSort1 (0) (ListPQ(0) ) 


The module’s output signature is the signature of sorting with respect to the argu- 
ment ordering. Its implementation is simply the composition of existing algorithms, all 
this under the abstraction with respect to the argument ordering. 

There is also a second way of writing the header of the abstract priority queue 
sorting algorithm: 


module type PQFunctSig 
= functor (0O’: OrderedType) 
— PrioQueSig with module E=0’ 


module PQSort2 (O: OrderedType) (PQF: PQFunctSig) 
SortSig with module E=0 


The above code fragment consists of two parts: first the functor type is defined, 
which corresponds exactly to the specification of List PQ. Then the sorting algorithm 
is presented as a higher-order functor, i.e. a functor which itself takes a functor as a 
parameter. Of course, the first line of POSort2 is the application of POF to O in order 
to get the priority queue PQ, and from this point on the code of both functors is identical. 

Higher-order functors are not available in OBJ. 

In order to obtain the final sorting algorithm, one applies POSort2 to List PQ: 


module Sort2 (O: OrderedType) 
SortSig with module E=0 
= PQSort2(O) (ListPQ) 


The first approach to composing modules is more general than the second, because 
one does not necessarily have to use a generic priority queue functor. Consequently the 
use of data structures specialized to a given type is possible (e.g. if a set of values is 
finite a priority queue can be based on counting elements). 

On the other hand, the higher-order functor may correspond better to the intended 
way the programmer wishes to use a given part of code in the whole program. This is 
exactly our case, since we want to compose PQSort with the generic List PQ functor. 
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Of course it is possible to get the advantages of both approaches: write the most 
general specification, as in PQSort1, and then wrap it in a higher-order functor, pre- 
senting the intentions of the programmer: 


module PQSort2’ (O: OrderedType) (PQF: PQFunctSig) 
SortSig with module E=0 
= PQSort1 (0) (PQF(O)). 


6 Priority Queues in Cog 


The structure of the Coq development is the same as in ML, but the signatures now 
contain formal specifications, and structures contain proofs of desired properties. 

The first file, as in ML, contains the definition of all needed signatures. The signa- 
tures are preceded by the definition of the type of a three-value proof-carrying compar- 
ison: the type comparison t < = a bis for example inhabited by Lt p, where 
p is a proof of the property a < b. 

Inductive comparison (X : Set) (lt eq : X — X — Prop) (xy: X) : Set := 
| Lt : lt xy — comparison X It eq x y 
| Eq : eq x y — comparison X It eq x y 
| Gt : It y x — comparison X It eq x y. 


Module Type OrderedType. 


Parameter t : Set. 


Parameter eq : t — t — Prop. 
Parameter It : t — t — Prop. 


Parameter compare : V x y : t, comparison t lt eq x y. 
Axiom eq-refl : V x : t, eq x x. 

Axiom eq-sym : Y x y: t, eq xy —> eqy x. 

Axiom eq-trans : Y xy z:t,eqxy — eqyZ— eqxz. 


Axiom /It_trans: Vxyz:t,ltxyltyzltxz. 
Axiom /t_not_eq: V xy: t, Itxy— 7 eq xy. 


Hint Immediate eq-sym. 
Hint Resolve eq-refl eq_trans It_not_egq It_trans. 


End OrderedType. 
Module Type PrioQueSig. 


(* Declarations *) 

Declare Module E : OrderedType. 
Parameter t : Set. 

Parameter create : t. 

Parameter insert : t > E.t — t. 
Parameter extract : t — option (t x E.t). 
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(* Specification - auxiliary functions and predicates *) 
Parameter number : t — E.t — nat. 
Definition empty q : Prop := x, number q x =Q. 

(* Queues are similar iff q1 = q2 + {x} *) 
Definition similar (q1 q2 : t) (x : E.t) : Prop := 

(Vy: E.t, = E.eq x y — number q1 y = number q2 y) 

A (Y y: E.t, E.eq x y — number q1 y = S (number q2 y)). 
(* Specification of operations *) 
Axiom create_empty : empty create. 
Axiom insert_similar : 

V (q : À (x: E.t), similar (insert q x) q x. 
Axiom extract_similar : 

V(qq2:t) (x: E.t), 

extract q = Some (q2, x) — similar q q2 x. 

Axiom extract_minimal : 

V(qq2:t)(xy: E.t), 

extract q = Some (q2, x) > E.lt yx — number q y=0. 

Axiom extract_empty_none : 

V q : t, extract q = None — empty q. 

End PrioQueSig. 


Module Type SortSig. 
Declare Module E : OrderedType. 
Parameter sort : list E.t — list E.t. 


Definition le el e2 := E.lt el e2 V E.eq el e2. 
Axiom sort_sorted : V l: list E.t, Sorting.sort le (sort 1). 


Axiom eq-dec : V el e2 : E.t, {E.eq el e2} + { a E.eq el e2}. 
Axiom sort_permut : 
V1: list E.t, Permutation.permutation E.eq eq-dec l (sort 1). 


End SortSig. 


The signature OrderedType, taken from [13], contains the same calculable elements 
as its ML counterpart, but is constructed differently. Its main elements are the type and 
the equality and ordering predicates (i.e. logical elements). The function compare is 
only an addition to the predicates. Instead of an int, the compare function returns 
an element of the comparison type defined earlier, i.e. the ordering decision together 
with the proof that the decision is right. 

Apart from this, the OrderedType signature contains axioms specifying the prop- 
erties of ordering and equality and hints to instrument automatic tactics, trying to prove 
properties concerned with the order. The latter element is of course not part of the type 
theory. 

The priority queue signature is also divided into two parts: declarations and speci- 
fications. The declarations contain the same elements as in ML with the only exception 
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of the extract function, which returns an option type, i.e. Some value if the queue is 
not empty and None otherwise (instead of raising an exception). Note, however, that in 
order to specify the queue operations one must declare additional functions, counting 
the number of occurrences of a given element in the queue. Based on this function, 
two predicates empty and similar can easily be defined in order to write the purely 
logical axioms specifying how create, insert and extract work. 

The signature of a sorting algorithm is simply an extension of its ML counterpart by 
the logical axioms, saying that the list resulting from sorting is sorted and is a permu- 
tation of the input list. The Sorting.sort and Permutation.permutation 
predicates from the Coq standard library need additional elements such as less than or 
equal predicate 1e or equality decidability property eq_dec. 

In the second file, the header of List PQ is the following: 


Module ListPQ (O: OrderedType) <: (PrioQueSig with Module E:=O). 


The difference between the ML and Coq versions of this functor is the way the resulting 
module type is declared. The Coq syntax Module M <: SIG means that the type checker 
should check that the principal signature of M is included in SIG and the users of M 
are allowed to use all the information inferred in its principal signature. We say that 
this module type annotation is transparent as opposed to the opaque one that was used 
in the ML version. The fact the transparent annotation is used is only important for 
evaluation of programs inside Coq, such as Eval compute in (sort l), see below. Thanks 
to transparency the reduction mechanism can see the definitions of all functions and 
evaluate them. For typechecking reasons the opaque module type annotations would be 
equally good. 

In Coq, we also have two possibilities of writing the POSort functor. The header of 
the first-order one is as follows: 


Module PQSort! (O: OrderedType) 
(PQ: PrioQueSig with Module E := O) 
<: SortSig with Module E := O. 


Unfortunately, due to the requirement that functors are applied only to names of mod- 
ules, and the lack of local module bindings, the composition of PQSort! and ListPQ is 
somewhat lengthy: 


Module Sort! (O: OrderedType) <: (SortSig with Module E:=O). 
Module ListPQ_O := ListPQ O. 
Module PQSort_O := PQSortl O ListPQ_O. 
(* Include PQSort_O. *) 
Module £ := PQSort_O.E. 
Definition sort := PQSort_O.sort. 
Definition le := POSort_O.le. 
Definition sort_sorted := PQSort_O.sort_sorted. 
Definition eg_dec := POSort_O.eq_dec. 
Definition sort_permut := PQSort_O.sort_permut. 
End Sortl. 
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Now we can apply the functor to an example module NatOrder and test the sorting! 


Module NatSort1 <: (SortSig with Module E:=NatOrder) 
:= Sort] NatOrder. 
Eval compute in (NatSort!.sort (4::5::1::2::nil)). 


The higher-order way of writing POSort 


Module Type PQFunctSig (O’ : OrderedType) 
:= PrioQueSig with Module E := 0’. 


Module PQSort2 (O: OrderedType) (PQF: PQFunctSig) 
<: SortSig with Module E := O. 


starting with the creation of the priority queue for O: 


Module PQ := POF O. 
leads to a much simpler composition code: 


Module Sort2 (O: OrderedType) 
<: SortSig with Module £:=O 
:= PQSort2 O ListPQ. 


Unfortunately, due to a certain weakness of the Coq module system with respect to 
transparency of higher-order functors, the instances of the PQSort2 functor cannot be 
evaluated inside Coq. However, the ML code extracted from both functors can of course 
be evaluated without any problems. 

To summarize, it is interesting to compare the size of ML and Coq code. It follows 
that Coq signatures with specifications by logical formulas are about 2-3 times longer 
than their commented ML counterparts. Unfortunately, the implementations, which in 
Coq contain proofs of required properties, are about 10-20 times longer than the corre- 
sponding ML code. 


7 Conclusion 


We have presented three languages which integrate specification and implementation. 
With the simple example of an abstract sorting algorithm based on a priority queue, we 
demonstrate how each of the three languages can be used for programming in the large 
by writing specifications, implementations and by composing abstract components. In 
particular, we want to stress that parameterization should be available for all kinds of 
modules. 

We have seen that the most important concepts of the OBJ modules are still present 
in more recent systems such as ML and Coq. Indeed, OBJ objects correspond to struc- 
tures, parametric objects to functors and OBJ theories to signatures. Only the parametric 
OBJ theories do not have direct representatives in the ML and Coq module systems, but 
abstract signatures can easily be refined to concrete ones using the “with” notation. On 
the other hand, higher-order modules are lacking in OBJ. Although they are not much 
used in practice, our example shows their adequacy to describing dependencies on other 
parametric components. 
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Concerning the ability of these languages to specify and implement software com- 
ponents, OBJ lies somewhere between ML and Coq. In ML, specifications are sim- 
ply given as types for functions, and execution is based on an efficient call-by-value 
evaluation strategy. In OBJ, one can write first-order equational and membership spec- 
ifications that are executable via an efficient built-in associative-commutative rewrit- 
ing mechanism guided by user-defined strategies. In Coq, the specification language 
is higher-order predicate logic, which is by far the most expressive of the three. This 
makes it possible to write a specification, implement it, prove that the implementation 
is correct, run the implementation inside Coq and even extract the program into an exe- 
cutable ML code. Some of these steps may of course involve complex, lengthy machine 
computations. 

The question arises of which language is best suited for fast prototyping. If no veri- 
fication is needed, the answer would probably be ML. Separating signatures from their 
actual implementation is just very neat, and allows a two steps development methodol- 
ogy which does not require much interaction between these two phases unless there are 
major design errors. Because OBJ modules provide at the same time with an interface 
and logical requirements for the interface, specification and coding are no more clearly 
separated. The development process becomes more complicated, going back and forth 
between different pieces of the code. A comparison with Coq is more difficult, since 
Coq gives you a lot more: while it is possible in OBJ to forget about the proof obliga- 
tions generated when typing modules, this is not the case with Coq. A consequence is 
that every change requires tedious adjustments of the proofs. 

Acknowledgments: We thank Andrzej Gasienica-Samek and Tomasz Stachowicz 
for their help with the Coq development, Pierre-Yves Strub for checking preliminary 
versions of the OBJ development in Maude, and the referee for many valuable com- 
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Abstract. Adhesive high-level replacement (HLR) systems have been 
recently introduced as a new categorical framework for graph tranfor- 
mation in the double pushout (DPO) approach. They combine the well- 
known concept of HLR systems with the concept of adhesive categories 
introduced by Lack and Sobociński. 

While graphs, typed graphs, attributed graphs and several other variants 
of graphs together with corresponding morphisms are adhesive HLR cat- 
egories, such that the categorical framework of adhesive HLR systems 
can be applied, this has been claimed also for Petri nets. In this paper 
we show that this claim is wrong for place/transition nets and algebraic 
high-level nets, although several results of the theory for adhesive HLR 
systems are known to be true for the corresponding Petri net transfor- 
mation systems. 

In fact, we are able to define a weaker version of adhesive HLR categories, 
called weak adhesive HLR categories, which is still sufficient to show all 
the results known for adhesive HLR systems. This concept includes not 
only all kinds of graphs mentioned above, but also place/transition nets, 
algebraic high-level nets and several other kinds of Petri nets. For this 
reason weak adhesive HLR systems can be seen as a unifying framework 
for graph and Petri net transformations. 


1 Introduction 


The use of categorical techniques for unifying frameworks in Computer Science 
has a long tradition. In the early 1970ies the concept of closed monoidal cat- 
egories was proposed by Goguen in [i] as a unifying framework for different 
kinds of deterministic automata. An extension of this framework to nondeter- 
ministic and stochastic automata using pseudo-closed categories was presented 
in [2]. Other important examples are the unifying frameworks of institutions 
and specification frames respectively. This first framework is based on a categor- 
ical treatment of signatures, models and sentences introduced by Goguen and 
Burstall [3], and the second one in [4] combines directly signatures and sentences 
to specifications. In both cases we obtain a unifying framework for all kinds of 
algebraic and logical specification techniques. 
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Most recently the unifying framework of adhesive high-level replacement 
(HLR) systems for different kinds of graph transformation systems has been 
introduced in [5] [6]. The corresponding concept of adhesive HLR categories in- 
tegrates those of HLR categories in [7] and adhesive categories by Lack and 
Sobociński [8], which was later extended to quasi-adhesive categories [9]. The 
concept of adhesive categories requires the existence of pushouts along monomor- 
phisms and pullbacks, and the property that pushouts along monomorphisms 
are van Kampen (VK) squares. Roughly spoken the last property means that 
pushouts are stable under pullbacks and vice versa pullbacks are stable under 
combined pushouts and pullbacks. In the case of adhesive HLR categories the 
class of all monomorphisms is replaced by a subclass M of monomorphisms 
closed under composition and decomposition and the existence of all pullbacks 
by pullbacks along M-morphisms. In [5} [6] it is shown that there is a unifying 
framework of adhesive HLR systems for graph transformation systems based on 
the double pushout (DPO) approach [IO] concerning a large variety of different 
graph concepts, like labeled graphs, typed graphs, attributed graphs, typed at- 
tributed graphs and hypergraphs. The key idea is to show that adhesive HLR 
categories satisfy a number of different properties, called HLR properties, which 
are used in |7| to prove important results like the Local Church-Rosser Theorem, 
the Parallelism Theorem and the Concurrency Theorem. This was first shown 
for adhesive categories in [8] for the class M of all monomorphisms and later 
extended to adhesive HLR categories in [5] [6] and to quasiadhesive categories in 
[9], where M is the class of all regular monomorphisms. 


The idea to apply the DPO approach to Petri nets was first considered for 
place/transition nets in [7] and for algebraic high-level nets in [11]. In [5] we 
have claimed that the category (PTNets, M) of place/transition nets with the 
class M of all injective morphisms is an adhesive HLR category in order to 
apply the general theory of adhesive HLR systems also to place/transition nets. 
Unfortunately this claim is wrong as we show in this paper. The reason is that 
PTWNets has general pullbacks, but pullbacks in general cannot be constructed 
componentwise in Sets. However, pullbacks along monomorphisms in PTNets 
can be constructed componentwise in Sets. This is the key idea to weaken the 
concept of adhesive HLR categories using weak VK squares, such that (PTNets, 
M) is a weak adhesive HLR category, and nevertheless this weaker concept 
still allows to verify the HLR properties used in [6] to prove under some 
additional assumptions the following main results: 

1. Local Church-Rosser Theorem, 

2. Parallelism Theorem, 

3. Concurrency Theorem, 

4. Embedding and Extension Theorem, 

5. Local Confluence Theorem - Critical Pair Lemma. 


In this paper we show for elementary nets, place/transition nets and algebraic 
high-level nets that they are weak adhesive HLR categories for a suitable class of 
morphisms. In [5] [6] we have shown already that adhesive HLR categories satisfy 
the HLR properties to prove the main results stated above. In this paper we show 


Weak Adhesive High-Level Replacement Categories and Systems 237 


that this is already true for weak adhesive HLR categories. This implies that the 
main results are also true for different kinds of Petri net transformation systems 
including elementary, place/transition and algebraic high-level nets. Note, that 
in contrast to the ” classical” theory of Petri nets and systems based on the token 
game, where the structure of the nets remains unchanged, the theory of Petri net 
transformations allows not only the token game, but also to change the structure 
of the nets. In this sense weak adhesive HLR categories can be seen as a unifying 
framework not only for graph but also for Petri net transformations. 
This paper is organized as follows: 

In Section] we review adhesive and adhesive HLR categories as introduced in [8] 
and [5]. In Section] we extend these concepts to weak adhesive HLR categories 
and systems. This is the basis to define Petri net transformation systems as an 
instance of weak adhesive HLR systems in Section [4] 
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2 Review of Adhesive and Adhesive HLR Categories 


The intuitive idea of adhesive categories are categories with suitable pushouts 
and pullbacks which are compatible with each other. More precisely the definition 
is based on so-called van Kampen squares. 

The idea of a van Kampen (VK) square is that of a pushout which is sta- 
ble under pullbacks, and vice versa that pullbacks are stable under combined 
pushouts and pullbacks. The name van Kampen derives from the relationship 
between these squares and the Van Kampen Theorem in topology (see [12]). 


Definition 1 (van Kampen square). A pushout (1) is a van Kampen square, 
if for any commutative cube (2) with (1) in the bottom and the back faces being 
pullbacks holds: the top face is a pushout iff the front faces are pullbacks. 
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It might be expected that at least in the category Sets of sets and functions 
each pushout is a van Kampen square. Unfortunately this is not true (see Ex. 
Ø). But at least pushouts along monomorphisms (injective functions) are VK 


squares (see [8] |9]). 


Fact 1 (VK squares in Sets). In Sets, each pushout along a monomorphism 
is a VK square. Pushout (1) is called a pushout along a monomorphism, if m 
(or symmetrically f) is a monomorphism. 


Example 1 (VK squares in Sets ). In the following diagram on the left hand side 
a VK square along an injective function in Sets is shown. All morphisms are 
inclusions, or 0 and 1 are mapped to x and 3 to 2. 

Arbitrary pushouts are stable under pullbacks in Sets. That means, one 
direction of the VK square property is also valid for arbitrary morphisms. But 
the other direction is not necessarily fulfilled. The cube on the right hand side 
is such a counterexample for arbitrary functions: all faces commute, the bottom 
and the top are pushouts and the back faces are pullbacks. But obviously the 
front faces are no pullbacks, therefore the pushout in the bottom fails to be a 
VK square. 


{0,1} _ {0,1} x {0,1} 
Pan {0,1} Ti 


0:1,2,3} NX, +mod2 N 1} 
a wea 














In the following definition of adhesive categories only those VK squares of 
Def. [I] are considered where m is a monomorphism. According to Lack and 
Sobociński [8] we define 


Definition 2 (adhesive category). A category C is an adhesive category, if 


1. C has pushouts along monomorphisms (i.e. pushouts, where at least one of 
the given morphisms is a monomorphism), 
2. C has pullbacks, 


3. pushouts along monomorphisms are VK squares. 


Let us first consider some basic examples and counterexamples for adhesive 
categories (see [8]). 
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Fact 2 (Sets, Graphs, Graphsre as adhesive categories). The categories 
Sets of sets and functions, Graphs of graphs and graph morphisms and 
Graphste of typed graphs and typed graph morphisms are adhesive categories. 


Counterecample 2 (non-adhesive categories). For example, the category Posets 
of partially ordered sets and the category Top of topological spaces and con- 
tinuous functions are not adhesive categories. In the following diagram a cube 
in Posets is shown that fails to be a van Kampen square. The bottom is a 
pushout with injective functions (monomorphisms) and all lateral faces are pull- 
backs, but the top square is no pushout in Posets. The proper pushout over the 
corresponding morphisms is the square (1). 


ee 
0-3 os 
ee | 3——+2—3 
0-2-3 
| (1) 
_-1-3 IEN 
03133 ae mae ie 
1-2-3 
0-1-2383 














Remark 1. In [9] Lack and Sobocitiski have also introduced a variant of adhesive 
categories, called quasiadhesive categories, where the class of monomorphisms in 
Def. Plis replaced by regular monomorphisms. A monomorphism is called regular, 
if it is the equalizer of two morphisms. For adhesive and also for quasiadhesive 
categories Lack and Sobocinski have shown, that all the HLR properties, shown 
for adhesive HLR. categories in Thm. P] below, are valid. This allows to prove 
several important results of graph transformation systems in the framework of 
adhesive and also of quasiadhesive categories. On the other hand adhesive and 
also quasiadhesive categories are special cases of adhesive HLR categories (C, M) 
(see Def. B]below), where the class M is specialized to the class of all monos and 
of all regular monos respectively. 


The main difference between adhesive HLR categories and adhesive categories 
is that a distinguished class M of monomorphisms is considered instead of all 
monomorphisms, so that only pushouts along M-morphisms have to be VK 
squares. Moreover, only pullbacks along M-morphisms and not over arbitrary 
morphisms are required (see [5] [6]). 


Definition 3 (adhesive HLR category). A category C with a morphism 
class M is called an adhesive HLR category, if 


1. M is a class of monomorphisms closed under isomorphisms, composition 
(f: A> BEM,g:B>ACEM=s gof € M) and decomposition 
(g9 f E€ M,g E€ M => f €M), 
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2. C has pushouts and pullbacks along M-morphisms and M-morphisms are 
closed under pushouts and pullbacks, 
3. pushouts in C along M-morphisms are VK squares. 


Remark 2. M-morphisms are closed under pushouts if, for a pushout (1) in Def. 
m E€ M implies that n € M. Analogously, M-morphisms are closed under 
pullbacks if, for a pullback (1), n E€ M implies that m € M. 


Example 3 (adhesive HLR categories). 


— All adhesive categories are adhesive HLR categories for the class M of all 
monomorphisms. 

— The category (HyperGraphs, M) of hypergraphs with the class M of 
injective hypergraph morphisms is an adhesive HLR category. 

— Another example for an adhesive HLR category is the category (Sig, M) of 
algebraic signatures with the class M of all injective signature morphisms. 

— The category (ElemNets, M) of elementary Petri nets with the class M of 
all injective Petri net morphisms is an adhesive HLR category (see Fact B). 

— An important example of an adhesive HLR category is the category 
(AGraphsatre, M) of typed attributed graphs with a type graph ATG 
and the class M of all injective morphisms with isomorphisms on the data 
part. 














Counterecample 4 (non-adhesive HLR categories). The categories (PTNets, 
M) of place/transition nets and (Spec, M) of algebraic specifications, where 
M is the class of all the corresponding monomorphisms, fail to be adhesive HLR 
categories (see Ex. [6). 














3 Weak Adhesive HLR Categories and Systems 


As pointed out in Counterex. [4] the category (PTNets, M) of place/transition 
nets with the class M of all monomorphisms fails to be an adhesive HLR cat- 
egory. For this reason we introduce now a slightly weaker version, called weak 
adhesive HLR category. 

For a weak adhesive HLR category we only soften item 3 in Def. [3] so that 
only special cubes are considered for the VK square property. 


Definition 4 (weak adhesive HLR category). A category C with a mor- 
phism class M is called a weak adhesive HLR category, if 


1. M is a class of monomorphisms closed under isomorphisms, composition 
and decomposition, 

2. C has pushouts and pullbacks along M-morphisms and M-morphisms are 
closed under pushouts and pullbacks, 
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3. pushouts in C along M-morphisms are weak VK squares, i.e. the VK square 
property holds for all commutative cubes with m € M and (f € M or 
b,c,d E€ M) (see Def. W). 


Remark 3. For the weak version of the VK square property it is sufficient to 
require f E€ M or b,c,d E€ M. In both cases this makes sure that the pullback 
squares in the cube are pullbacks along M-morphisms. 


Example 5 (weak adhesive HLR categories). 


— All adhesive HLR categories are weak adhesive HLR categories. 

— The category (PTNets, M) of place/transition nets with the class M of all 
monomorphisms is a weak adhesive HLR category (see Fact Ø). 

— Similarly the category AHLNets(SP, A) of algebraic high-level nets with 
fixed specification SP and algebra A considered with the class M of injective 
morphisms is a weak adhesive HLR. category (see Fact 5). 

— An interesting example of high-level structures, which are not graph-like, are 
algebraic specifications (see [13]). The category (Spec, M strict) of algebraic 
specifications with the class M strict of all strict injective specification mor- 
phisms is a weak adhesive HLR category. 














Similar to adhesive HLR categories also weak adhesive HLR categories are 
closed under product, slice, coslice, functor and comma category constructions. 
That means we can construct new weak adhesive HLR categories from given 
ones. 


Theorem 1 (construction of weak adhesive HLR categories). Weak ad- 
hesive HLR categories can be constructed as follows: 


1. If (C, Mı) and (D, M2) are weak adhesive HLR categories, then the product 
category (C x D, Mı x Mg2) is a weak adhesive HLR category. 

2. If (C, M) is a weak adhesive HLR category, so are the slice category (C\X, 
M N C\X) and the coslice category (X\C, M N X\C) for any object X 
in C. 

3. If (C, M) is a weak adhesive HLR category, then for every category X the 
functor category ([X, C], M-functor transformations) is a weak adhesive 
HLR category. An M-functor transformation is a natural transformation 
t: F —G where all morphisms tx : F(X) —> G(X) are in M. 

4. If (A, Mı) and (B, M2) are weak adhesive HLR categories and F : A > C, 
G : B — C are functors, where F preserves pushouts along M,-morphisms 
and G preserves pullbacks (along Mz-morphisms), then the comma category 
(ComCat(F, G;Z), M) with M = (Mı x M2) Morcomcat(F,G;) is a weak 
adhesive HLR category . 


In the following theorem we show several important properties for weak ad- 
hesive HLR categories, which are essential to prove the main results in Cor. [I] 
These properties have been required as HLR properties in [7] to show some of 
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the main results for HLR systems. In [8], it was shown already that these HLR 
properties are valid for adhesive categories. They were extended to adhesive HLR 
categories in [5], and now also for weak adhesive HLR categories using almost 
the same proofs. 


Theorem 2 (properties of weak adhesive HLR categories). Given a weak 
adhesive HLR category (C, M), then the following properties hold: 


1. Pushouts along M-morphisms are pullbacks: Given the following pushout 
(1) with k E€ M, then (1) is also a pullback. 

2. M pushout-pullback decomposition lemma: Given the following commuta- 
tive diagram with (1)+(2) being a pushout, (2) a pullback, w E€ M and 
(LEM oruE M). Then (1) and (2) are pushouts and also pullbacks. 

3. Cube pushout-pullback lemma: Given the following commutative cube (3), 
where all morphisms in the top and in the bottom are in M, the top is a 
pullback and the front faces are pushouts. Then we have: the bottom is a 
pullback iff the back faces of the cube are pushouts. 
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4. Uniqueness of pushout complements: Givenk: A— Be M ands: B — D 
then there is up to isomorphism at most one C withl: A— C and u : C —> D 
such that (1) is a pushout. 


Now we are able to generalize graph transformation systems, grammars and 
languages in the sense of based on the category Graphs to weak adhesive 
HLR categories, which was already done for HLR, adhesive and adhesive HLR 
categories in [7], [8] and [5] respectively. 

In general, a weak adhesive HLR system is based on productions, also called 
rules, that describe in an abstract way how objects in this system can be trans- 
formed. An application of a production is called direct transformation and de- 
scribes how an object is actually changed by the production. A sequence of these 
applications yields a transformation. 


Definition 5 (production and transformation). Given a weak adhesive 


HLR category (C, M), a production p = (L LKS R) (also called rule) 
consists of three objects L, K and R called left hand side, gluing object and right 
hand side respectively, and morphisms l: K —> L, r : K — R with l, r € M. 

Given a production p = (L LKS R) and an object G with a morphism 
m: L— G, called match. A direct transformation G S H from G to an object 
H is given by the following diagram, where (1) and (2) are pushouts. 


Weak Adhesive High-Level Replacement Categories and Systems 243 





-a l K r = R 





~ f D g => H 


A sequence Go => G, > ... > Gn of direct transformations is called a trans- 
formation and is denoted as Go => Gy. For n = 0, we have the identical trans- 
formation Go 8 Go, i.e. f = g = idg,. Moreover, we allow for n = 0 also 
isomorphisms Go = Go, because pushouts and hence also direct transformations 
are only unique up to isomorphism. 


Definition 6 (weak adhesive HLR system, grammar and language). A 
weak adhesive HLR system AHS = (C, M, P) consists of a weak adhesive HLR 
category (C, M) and a set of productions P. 

A weak adhesive HLR grammar AHG = (AHS, S) is a weak adhesive HLR 
system together with a distinguished start object S. 

The language L of a weak adhesive HLR grammar is defined by L = {G | 3 
transformation S > G}. 





In [5] [6] it is shown that the HLR properties stated in Thm. [2] together 
with binary coproducts compatible with M are sufficient to prove the following 
main results for adhesive HLR systems. Hence we also have the following main 
results for weak adhesive HLR. systems which are stated explicitely in [7] for 
HLR systems and in [5] [6] for adhesive HLR systems. 


Corollary 1 (main results for weak adhesive HLR systems). Given a 
weak adhesive HLR system with binary coproducts compatible with M (i.e. f,g € 
M = f +g E€ M), then we have the following results: 


1. Local Church-Rosser Theorem, 
2. Parallelism Theorem, 
3. Concurrency Theorem. 


The Local Church-Rosser Theorem allows one to apply two graph transfor- 
mations G => H; via pı and G => Hə via p2 in an arbitrary order leading to the 
same result H, provided that they are parallel independent. In this case they can 
also be applied in parallel, leading to a parallel graph transformation G > H 
via the parallel production pı + p2. This second main result is called the Paral- 
lelism Theorem. The Concurrency Theorem is concerned with the simultanous 
execution of causally dependent transformations. 


4 Petri Net Transformation Systems 


Petri net transformation systems have been first introduced in [7] for the case 
of low-level nets and in [LI] for high-level nets using the algebraic presentation 
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of Petri nets as monoids as introduced in [I4]. The main idea of Petri net trans- 
formation systems is to extend the well-known theory of Petri nets based on the 
token game by general techniques which allow to change also the net structure 
of Petri nets. In [15], a systematic study of Petri net transformation systems has 
been presented in the categorical framework of abstract Petri nets, which can 
be instantiated to different kinds of low-level and high-level Petri nets. In this 
chapter we show that the category (ElemNets, M) of elementary Petri nets is 
an adhesive HLR category (see Fact [B) and that the categories (PTNets, M) of 
place/transition nets and (AHLNets(SP, A), M) of algebraic high-level nets 
over (SP, A) are weak adhesive HLR categories (see Fact [4] and 5). The corre- 
sponding instantiations of weak adhesive HLR systems lead to different kinds of 
Petri net transformation systems. 

In the following we present a simple grammar ENGG (elementary net graph 
grammar) for elementary Petri nets, which allows to generate all elementary 
nets. The start net S of ENGG is empty. We have a production addPlace to 
create a new place p and productions addTrans(n,m) for n,m € N to create a 
transition with n input and m output places. 


addPlace: 


addTrans(n,m): 

















The grammar ENGG can be modified to a grammar PTGG (place/transition 
net graph grammar) for place/transition nets if we replace the productions 
addTrans(n,m) by productions addTrans(n,m)(t1,...,in,01,---;0m), Where 
11, -.-) Ín LESP. 01, ..-, Om correspond to the arc weights of the input places p1, ..., Dn 
resp. the output places q1, ...,@m- 


Definition 7 (elementary Petri net). An elementary Petri net is given by 
N = (P,T,pre,post : T — P(P)) with a set P of places, T of transitions 
and pre- and post-domain functions pre, post : T — P(P), where P(P) is the 
power set of P. A morphism f : N — N' in ElemNets is given by f = (fp: 
P — P', fr :T — T') compatible with the pre- and post-domain function, i.e. 
pre' o fr = P(fp) opre and post’ o fr = P(fp) o post. 


Fact 3 (elementary Petri nets as adhesive HLR category). The category 
(ElemNets, M) of elementary Petri nets is an adhesive HLR category, where 
M is the class of all injective morphisms. 


Proof idea. The category ElemNets is isomorphic to the comma category 
ComCat(IDgets,P;Z), where P : Sets — Sets is the power set functor and 
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T = {1,2}. According to Thm. []4 it suffices to note that P : Sets — Sets pre- 
serves pullbacks using the fact that (Sets, M) is an adhesive HLR category. 














Definition 8 (place/transition net). According to [14] a place/transition net 
N = (P,T, pre, post : T — P®) is given by a set P of places, a set T of transi- 
tions, as well as pre- and post-domain functions pre, post : T — P®, where P® 
is the free commutative monoid over P. A morphism f : N — N' in PTNets 
is given by f = (fp : P — P', fr : T — T’) compatible with the pre- and 
post-domain functions, i.e. pre’ o fr = fs o pre and post! o fr = fẸ o post. 


Fact 4 (place/transition nets as weak adhesive HLR category). The cat- 
egory (PTNets, M) of place/transition nets is a weak adhesive HLR category, 
but not an adhesive HLR category, if M is the class of all injective morphisms. 


Proof idea. The category PTNets is isomorphic to the comma category 
ComCat(IDsets, 0%;Z) with Z = {1,2}, where O® : Sets — Sets is the 
free commutative monoid functor. According to Thm. [J4 it suffices to note 
© : Sets — Sets preserves pullbacks along injective morphisms using the fact 
that (Sets, M) is a weak adhesive HLR category. This implies that (PTNets, 
M) is a weak adhesive HLR category. 

It remains to show that (PTNets, M) is not an adhesive HLR category. This 
is due to the fact, that O° : Sets — Sets does not preserve general pullbacks. 
This would imply that pullbacks in PTNets are constructed componentwise for 
places and transitions. In fact, in Ex. [6] we present a non-injective pullback in 
PTWNets, where the transition component is not a pullback in Sets, and a cube 
which violates the VK properties of adhesive HLR categories. 

































































Example 6 (non-VK square in PTNets). The square (1) in Fig. [I] with non- 
injective morphisms g1, g2, p1, p2 is a pullback in the category PTNets, where 
the transition component is not a pullback in Sets. In the cube in Fig. [1] the 
bottom square is a pushout in PTNets along an injective morphism m € M, all 
side squares are pullbacks, but the top square is no pushout in PTNets. Hence 
we have a counterexample for the VK property. 














In the following we combine algebraic specifications with Petri nets leading to 
algebraic high-level (AHL) nets (see [11|). For simplicity we fix the correspond- 
ing algebraic specification SP and the SP-algebra A. For the more general case, 
where also morphisms between different specifications and algebras are allowed, 
we refer to [II]. Under suitable restrictions for the morphisms we also obtain a 
weak adhesive HLR category in the more general case (see for HLR proper- 
ties of high-level abstract Petri nets). 

Intuitively, an AHL net is a Petri net, where ordinary, uniform tokens are 
replaced by data elements from the given algebra. Firing a transition t means to 
remove some data elements from the input places and add some data elements, 
computed by term evaluation, to the output places of t. There could be also some 
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Fig. 1. A pullback and a non-VK square in PTNets 


firing conditions to restrict the firmg behaviour of a transition. In addition, a 
typing of the places restricts the data elements which could be put on each place 
to that of a certain type. 


Definition 9 (AHL net). An AHL net over (SP, A), where SP = (SIG, E, X) 
has additional variables X and SIG = (S,OP), is given by N = (SP,P,T, 
pre, post, cond, type, A) with sets P and T of places and transitions, 
pre,post : T — (Tsrc(X) & P)® as pre- and post-domain functions, 
cond: T —> Prin(Eqns(SIG, X)) assigning to each t € T a finite set cond(t) of 
equations over SIG and X, type: P > S a type function and A an SP-algebra. 
Note that Tsra(X) is the SIG-term algebra with variables X, (Tsra(X) 8 P) = 
{(term,p) | term € Tsra(X)type(p) p E P} and O® is the free commutative 
monoid functor. A morphism f : N — N' in AHLNets(SP, A) is given by a 
pair of functions f = (fp : P — P', fr:T — T’) which are compatible with the 
pre, post, cond and type functions as shown below. 
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Fact 5 (AHL nets as weak adhesive HLR category). Given an algebraic 
specification SP and an SP-algebra A, the category (AHLNets(SP, A), M) of 
algebraic high-level nets over (SP, A) is a weak adhesive HLR category. M is 
the class of all injective morphisms f, i.e. fp and fr are injective. 
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Proof idea. According to the fact that(SP, A) is fixed the construction of push- 
outs and pullbacks in AHLNets(SP, A) is essentially the same as in PTNets, 
which is already a weak adhesive HLR category. We can apply the idea of comma 
categories ComCat(F,G;Z), where in our case the source functor of the oper- 
ations pre, post, cond, type is always the identity [Dgets, and the target func- 
tors are (Tsre(X) ® -)® : Sets — Sets and two constant functors. In fact 
(Tsra(X) ® _) : Sets — Sets, the constant functors and O® : Sets > Sets pre- 
serve pullbacks along injective functions. This implies that also (Tsrg(X)@-)® : 
Sets — Sets preserves pullbacks along injective functions, which is sufficient to 
verify the properties of a weak adhesive HLR category. 


























Corollary 2 (main results for Petri net transformation systems). The 
results stated in Cor. U] are valid for Petri net transformation systems based on 
the following categories: 


1. (PTNets, M) (see Fact{4), 
2. (ElemNets, M) (see Fact[3), 
3. (AHLNets, M) (see Fact [8). 


Example 7 (place/transition net transformation). We present an example of a 
place/transition net transformation system from [16], where a communication 
network is created and analyzed w.r.t. lifeness and safety properties. Here we 
only consider the construction using Petri net transformations. The system is 
composed of 3 components: a buffer, a printer and a communication unit depicted 
in Fig. 2] The behaviour of the buffer and the printer are obvious from the 
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Fig. 2. Components of the system 
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figure. The communication unit can send a message through a secure (SSC) 
or non-secure (NSC) channel. Using the NSC channel a message may become 
corrupted, therefore two copies of the message are sent, which are compared by 
the receiving subunit D. If both copies differ (NOK), then the transmission has 
to be repeated, otherwise (OK) it ends. 

Petri net transformations are used to connect these three components. In the 
top row of Fig. |3(a)| the production to connect buffer and printer is depicted. 
Fig. |3(a)| shows the whole Petri net transformation as the application of this 
production to the components buffer and printer. In Fig. [3(b)] and Fig. the 
corresponding productions for connecting the communication unit with buffer 
and printer are shown respectively. Applying all three productions leads to the 
communication network depicted in Fig. [4] 
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Fig. 4. Resulting communication network 


5 Conclusion 


In this paper we have shown how to extend adhesive HLR categories and systems 
- recently introduced as a new categorical framework for graph transformation 
in [5] [6] - to weak adhesive HLR categories and systems in order to be suitable 
also as a unifying framework for Petri net transformations. It is interesting to 
note that all the results for HLR, systems based on adhesive HLR categories are 
still valid under the weaker assumptions of weak adhesive HLR categories. But 
we might need the stronger assumptions for results based on general pullback 
constructions as considered in [8] [9] 

Especially we have shown in this paper that the category (PTNets, M) of 
place/transition nets with the class M of all monomorphisms is not an adhesive 
HLR category, but a weak adhesive HLR category. This is sufficient to show 
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that the following main results of graph transformation systems are also valid 
for Petri net transformation systems: 

1. Local Church-Rosser Theorem 

2. Parallelism Theorem 

3. Concurrency Theorem 

We conjecture that also the following results 

4. Embedding and Extension Theorem 

5. Local Confluence Theorem 

stated explicitely in [5] [6] for adhesive HLR systems are valid for our Petri 
net transformation systems considered above. The Embedding and Extension 
Theorem allows us to embed transformations into larger contexts, and with the 
Local Confluence Theorem we are able to show local confluence of transformation 
systems on the basis of the confluence of critical pairs. As additional properties 
we need a suitable €’-M’ pair factorization and initial pushouts for Petri nets 
which have been shown for graphs already in [5] [6]. 
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From OBJ to Maude and Beyond 


José Meseguer 


University of Illinois at Urbana-Champaign, USA 
Dedicated to Joseph Goguen on his 65t” Birthday 


Abstract. The OBJ algebraic specification language and its Eqlog and 
FOOPS multiparadigm extensions are revisited from the perspective of 
the Maude language design. A common thread is the quest for ever more 
expressive computational logics, on which executable formal specifica- 
tions of increasingly broader classes of systems can be based. Several 
recent extensions, beyond Maude itself, are also discussed. 


1 Introduction 


Joseph and I met for the first time in San Francisco on February 25, 1977 at 
the First (and last!) International Symposium on Category Theory Applied to 
Computation and Control [80]. We wrote our first paper together in 1977 [57]. 
We worked very closely together at SRI from 1980 to 1988, when the bulk of 
our joint published work appeared, and, after his departure to Oxford and his 
subsequent return to San Diego, we have continued collaborating in various ways. 
In honoring him as a friend, colleague, and mentor of those early years, I want 
to reflect on some great things we did together at SRI from the perspective of 
how they have influenced the work that other colleagues and I have done on 
Maude in the 1990s and in the present decade. Since Maude itself is evolving 
and expanding in different directions, my reflections, will not only look at the 
past, but will also try to sketch what those directions, leading beyond Maude 
itself, look like. My views are necessarily subjective and partial, and my memory 
too; but that does not prevent me from trying to recollect things as best as I 
can, and from taking full responsibility for my own words and actions. 

One common thread of our joint work at SRI was the OBJ language. Joseph 
and I worked on OBJ1 with David Plaisted [63], and then, in the annus mirabilis 
1983-84, with Kokichi Futatsugi and Jean-Pierre Jouannaud on OBJ2 [47]. Then 
came OBJ3 [53], the most ambitious and far-reaching language design and im- 
plementation on which we worked with Claude and Hélène Kirchner, Patrick 
Lincoln, Aristide Mégrelis, and Timothy Winkler. A long paper combining in 
some way the OBJ2 and OBJ3 ideas appeared later [65], within an entire book 
dedicated to the OBJ experience [66]. I try to explain in this paper how not only 
OBJ, but also the Eqlog [59] and FOOPS [6I] multiparadigm extensions of OBJ, 
on which Joseph and I also worked together at SRI, have influenced Maude. But 
to make better sense of all this, I think that it may be worthwhile to first present 
my own perspective on the specification language design challenges that we have 
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been trying to meet all along, and which have motivated the design of each of 
these languages. 


1.1 System Specification Vs. Property Specification 


In discussing different uses of logic in computer science, considerable confusion 
can arise from lack of relevant distinctions. One that I have repeatedly found 
useful to clarify some key issues is the distinction between system specification 
and property specification. In a system specification we are after an unambiguous 
specification of a given system and how it actually works. In its most useful form, 
a system specification is executable and therefore provides an executable model 
of the system. Such specifications are enormously useful, since a system design 
can then be tested and analyzed in various ways, and it is possible to refine, 
sometimes even automatically, such an executable model into an actual system 
implementation. 

By contrast, when specifying properties of a system we are not necessarily 
after an executable model of our system. Instead, we assume it, as either al- 
ready given or to be developed later, and specify such properties in a typically 
nonexecutable manner: for example in first-order logic, higher-order logic, or 
some temporal logic. That is, the properties we specify have an intended model, 
namely the system design captured by a system specification, and we are in- 
terested in verifying by different methods that the intended model satisfies the 
properties stated in our property specification. 


1.2 System Specification in Computational Logics 


The above distinction brings us to the heart of a real problem: how can we 
formally, that is using logical and mathematical methods, verify a property if 
the system specification we have is informal, that is, if it does not precisely 
define a mathematical model of our system? This is indeed a genuine problem. 
Having a formal grammar is a necessary but insufficient condition: we also need 
a formal semantics. This is where the rub comes with system specifications based 
on conventional programming languages. For some such languages nobody has 
managed so far to give a complete formal semantics and therefore the only 
unambiguous “specifications” of some languages are their different compilers, 
which may exhibit different behaviors. Here is where computational logics can 
render an invaluable service. A computational logic can either: 


1. be used as a declarative programming language with a precise mathematical 
semantics to directly express system specifications; or 

2. be used to give a precise mathematical semantics to a conventional program- 
ming language, so that a system specified by a program in such a language 
will indirectly acquire a precise mathematical meaning in the computational 
logic. 
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I have not yet defined what I mean by a computational logic. The simplest 
practical answer is: a logic that you can implement as a programming language. 
That is, you can define and implement a programming language whose programs 
are exactly theories in the given logic and whose program execution is logical de- 
duction. You then call such a language a declarative programming language. 
The point is that from the earliest times of computability theory, logical for- 
malisms and mathematical definitions of computability have gone hand in hand. 
For example, Herbrand-Goédel computable functions are defined by equational 
theories; and Church computability is defined in terms of the lambda calculus. 
Over time, this has given rise to various declarative programming languages. For 
example, pure Prolog is a declarative programming language associated to Horn 
logic; pure ML and Haskell are declarative programming languages associated to 
the typed lambda calculus; OBJ is a declarative programming language based on 
order-sorted equational logic; and Maude is a declarative programming language 
based on rewriting logic. 

One can always blur the above distinctions, but this is not very helpful. For 
example, there is always the Quixotic and amusing possibility of declaring that 
everything is a logic!, including, say, C++, thus arriving at a toothless notion 
of “logic”. The opportunities for confusion and obscurantism are indeed endless; 
but such verbal games are for the most part a waste of time. Furthermore, it 
is possible to give meta-logical requirements for declarative programming lan- 
guages that cut through silly verbal games of this kind: Joseph and I gave such 
requirements in terms of institutions in [60]; and I gave more detailed require- 
ments in terms of general logics in [85]. 


1.3 The Quest for More Expressive Computational Logics 


A lot of water has gone under the bridges since the 1930s. Founding computa- 
tion on a theory of recursive functions was a great achievement at its time and 
is still very useful today; but it is clearly a limited theory, and we know it. There 
is, for example, no meaningful way of thinking of internet computations as de- 
finable by recursive functions. Massive changes in the nature of computing and 
emergence of entirely new applications do not make older computational logics 
and declarative languages incorrect or useless; but they can make them limited, 
relegated to specific niches. If a wider, more general applicability beyond such 
niches is desired, computational logics are typically in need of either generaliza- 
tion or replacement. One good example is functional programming, which is of 
course a very elegant and powerful way of programming functional applications. 
It is certainly possible to add bells and whistles to a functional language, for 
example by grafting a process calculus on top of it, so as to make it suitable for 
nonfunctional applications such as distributed computing. But what is the logic 
of such a centaur? The fact that it can be given a semantics, just as Java can, 
proves nothing, since the real issue is whether the resulting language remains 
declarative in the precise sense of programs being theories in a logic, for a decent 
meta-theoretic notion of logic, and computation being deduction in such a logic. 
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Therefore, to preserve the declarative nature of a language, when extending 
it to cover new application areas, one should think primarily of how its under- 
lying logic can be extended, and only secondarily about the extended syntax: 
declarative language design is primarily a task of logic design. The design space 
is therefore the space, in fact the category, of logics. But there are tight design 
constraints and tradeoffs that require good judgment. Not all logics are compu- 
tational; and having a recursively enumerable set of deducible formulas is not a 
sufficient condition: first-order logic has that, but it is hopeless as a program- 
ming language. The logic has to remain lean and mean in order to allow efficient 
implementations as a programming language, and not just as a theorem prover. 
Yet, the whole point of an extension is to make the logic more expressive. How 
to achieve both goals in an optimal way is the challenge. 

OBJ and its extensions are a good case in point. As algebraic specifica- 
tion/equational programming languages, OBJ2 [47] and OBJ3 [53]65] were ar- 
guably the most expressive such languages in the 1980s. But they were, by the 
very nature of their underlying order-sorted equational logic and their as- 
sociated operational semantics [52[70], functional languages. Extending OBJ in 
a multiparadigm way was a task that Joseph and I undertook in the mid 1980s, 
resulting in two new language designs: Eqlog [59], and FOOPS [61]. Eqlog uni- 
fied functional/equational programming and Horn-logic programming; its logic 
design task was to embed order-sorted equational logic and Horn logic without 
equality into a suitable Horn logic with equality [60]. FOOPS unified equa- 
tional/functional programming, Horn-logic programming, and object-oriented 
programming. Although an underlying model-theoretic semantics was given, 
based on algebraic data types with hidden sorts and behavioral equivalence be- 
tween them in the sense of [58194], FOOPS fell short of having an underlying logic 
with modules as theories and computation as deduction. This was remedied later, 
by theoretical developments presenting various proposals for a hidden or “ob- 
servational” equational logic [50[51[56[55/122/64)68}1 T/ 10/9} 1 15]1 16/1 17]30]120). 
In hindsight, one can view CafeOBJ [46], BOBJ [54] and BMaude [96] as full- 
blooded declarative languages that achieve in a more satisfactory way many of 
the FOOPS goals. 


1.4 Rewriting Logic and Maude 


With rewriting logic and Maude |86/90/18]19], several of us undertook 
the task of unifying within a single declarative language: (i) equational/functional 
programming; (ii) object-oriented programming; and (iii) concurrent /distributed 
programming. That (iv) Horn-logic programming was also naturally embeddable 
in this framework was clear from the early stages of this project [89/90], but at 
the operational semantics level this required a generalization of narrowing that 
was achieved later [132]133]. Three more insights emerged over time as part of 
different research collaborations: (v) that real-time and hybrid systems could be 
naturally specified in rewriting logic [108]; (vi) that higher-order type theory 
was naturally embeddable in rewriting logic [130]; and (vii) that probabilistic 
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systems were likewise expressible in a natural probabilistic extension of rewrit- 
ing logic and could be simulated within rewriting logic itself [73]4]. In spite of 
being multiparadigm in all the above (i)—(vii) ways, rewriting logic remains re- 
markably lean and mean: it is a very simple formalism and, thanks to Steven 
Eker, has a very high-performance Maude implementation. Modules are indeed 
theories in the logic, and nothing more. Computation is deduction, and theories 
have initial models [88[13], which give semantics to modules and support in- 
ductive reasoning. Furthermore, operational properties such as termination can 
be usefully formulated and verified by adopting this logical/deductive viewpoint 


[B6179] 


2 From Order-Sorted to Membership Equational Logic 


Rewriting logic contains membership equational logic as a sublogic. In 
Maude’s language design this is reflected in its sublanguage of functional mod- 
ules, for equational theories with initial semantics, and of functional theories 
for equational theories with “loose” semantics. Therefore, in relating OBJ and 
Maude the first task at hand is relating their corresponding equational logics. 

One key reason why OBJ2 and OBJ3 were so expressive was their order- 
sorted type theory. That one should use types to make any reasonable sense of 
algebraic specifications goes without saying. But the problem with many-sorted 
equational logic is that it does not deal well with partiality. Many simple oper- 
ations, such as selectors in data structures or just simple arithmetic operations, 
are partial. To the embarrassment of many-sorted specifications, simple trade 
examples, such as the perennial stacks or the rational numbers, cannot be given 
elegant many-sorted specifications: the top of the empty stack or division by zero 
raise their ugly heads and require ugly ad-hoc solutions. 

The appeal of order-sorted equational logic [62] is that, by allowing the ex- 
pressive power of subtypes (subsorts), many partial functions become total on 
appropriate subsorts. Furthermore, function symbols can be subsort overloaded, 
which is very convenient in practice. But there are limits to the kind of partiality 
expressible by typing means alone, which are those available in order-sorted al- 
gebra. When the definedness of a function depends on semantic conditions such 
as, for example, the fact that for the concatenation of two paths in a graph to 
be defined the target node of the first must coincide with the source node of 
the second, order-sorted equational logic is not enough. This was understood 
early on, and led to formulating notions of unconditional [49] or conditional [52] 
sort constraints; but how to extend order-sorted equational logic so as to fully 
account for conditional sort constraints remained an open question. 

The appeal of membership equational logic (MEL) [92] is that it gives a full 
account of partiality, and even a systematic, functorial way of relating partial 
and total specifications [92]95]. Furthermore, as shown in [92], it embeds in a 
conservative way the “right” version of order-sorted equational logic, one that 
solves several anomalies, including the lack of pushouts of theory morphisms, 
in the version given in [62]. But does membership equational logic remain lean 
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and mean? The relevant facts are that it: (i) has a well-developed operational 
semantics by rewriting (see the systematic study [12], which also deals with many 
other automated deduction techniques); (ii) enjoys a high-performance Maude 
implementation; (iii) is a quite simple logic; and (iv) has initial and free models 
[92], on which inductive proof methods and inductive theorem proving tools can 
be based [1220]. From these facts it seems fair to conclude that the answer is 
definitely yes. 


In summary, therefore, we can view OBJ3 as a sublanguage of Maude’s func- 
tional sublanguage. The generalization from OBJ3 to Maude is further stressed 
by the fact that Maude supports order-sorted notation as convenient syntactic 
sugar for membership equational logic axioms. In membership equational logic 
atomic propositions are either equations t = t’, or memberships t : s, stat- 
ing that term t has sort s. A subsort declaration s < s’ is then just syntactic 
sugar for a conditional axiom g : s > x : s’. Similarly, an order-sorted opera- 
tor declaration f : s1...5n) —> s is syntactic sugar for the conditional axiom 
1:81 A...A Bn: Sn => f(@1,..-,En): S. 


A membership equational theory is a pair (X, H) with X a signature speci- 
fying the kinds, sorts, and function symbols, and with H a set of Horn clauses 
involving both equations and memberships. Kinds classify potentially meaning- 
ful expressions, and sorts within a kind classify actually defined expressions. 
Terms having a kind but not a sort correspond to undefined or error expres- 
sions. For example, 2/0 is in the Number kind but has no sort. For execution 
purposes we typically impose some requirements on such a theory. First of all, 
its Horn clauses H may be decomposed as a union E U A, with A a set of 
equations that we will reason modulo (for example, A may include associativity, 
commutativity and/or identity axioms for some of the operators in X). Second, 
the remaining Horn clauses E are typically required to be Church-Rosset] mod- 
ulo A, so that we can use the conditional equations in Æ as equational rewrite 
rules modulo A. Third, for some applications it is useful to make the equational 
rewriting relatior contert-sensitive |76)77|. This can be accomplished by spec- 
ifying a function u : X —> IN* assigning to each function symbol f € X (with, 
say, n arguments) a list u(f) = i1... ix of argument positions , with 1 <i; < n, 
which must be fully evaluated (up to the context-sensitive equational reduc- 
tion strategy specified by u) in the order specified by the list 71 ...i, before 
applying any equations whose lefthand sides have f as their top symbol. For 
example, for f = if _then_else_fi we may give u( f) = {1}, meaning that the first 
argument must be fully evaluated before the equations for if _then_else_fi are 


1 See for a detailed study of equational rewriting concepts and proof techniques 
for MEL theories. 

2 As we shall see, in a rewrite theory R rewriting can happen at two levels: (1) equa- 
tional rewriting with (possibly conditional) equations E; and (2) non-equational 
rewriting with (possibly conditional) rewrite rules R. These two kinds of rewriting 
are different. Therefore, to avoid confusion I will qualify rewriting with equations as 
equational rewriting. 
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applied] Therefore, for execution purposes we can specify a membership equa- 
tional theory as a triple (X, EUA, u), with A the axioms we rewrite modulo, and 
with u the map specifying the context-sensitive equational reduction strategy. 
A Maude functional module is then, essentially, a specification of the form fmod 
(X, EU A, u) endfm. 


3 Rewriting Logic: From OBJ to Maude 


As already mentioned, the whole point of rewriting logic and Maude 

86/90]18]19] was to unify within a single logic and associated declarative lan- 
guage: (i) equational/functional programming; (ii) object-oriented programming; 
and (iii) concurrent/distributed programming. For this unification, a purely 
equational/functional framework would be clearly unsuitabld4 The challenge 
therefore was to find a lean and mean superlogic of equational logic in which 
this unification could take place. 

A related challenge was to make some sense of the quite diverse menagerie 
of concurrency models that were around, often competing with each other as 
the “right” model of concurrency. A key strategy in this competition game was 
to produce, sometimes quite complicated, translations from other models, ad- 
duced as proof of universality of the proposed model. Implicit in this strategy 
was the belief that, given enough time, the right model, capable of expressing 
all the relevant concurrency concepts would emerge. This search for the Holy 
Grail of concurrency is certainly a chivalrous one; but I find serious grounds 
for being skeptical about its success. The main difficulty is that concurrency 
encompasses a very wide range of phenomena: there are concurrent functional 
programs, concurrent grammars, dataflow networks, actors, Petri nets of various 
ilks and colors, various synchronous and asynchronous process calculi, neural 
networks, and so on. Although translations between some of these models are 
possible, the fact that in this way some concurrency features can simulate others, 
perhaps in a complex way, is not particularly helpful. 

In my view, what was missing was a computational logic for concurrency that 
could serve as a semantic framework in which different concurrency models could 


3 As in OBJ2-3, in Maude maps p specifying context-sensitive equational reduction 
strategies are called evaluation strategies [47/40{18|, and u(f) = i1...i% is speci- 
fied with the strat keyword followed by the string (i1 ... ik 0), with 0 indicating 
evaluation at the top of the function symbol f. For an in-depth study of the rela- 
tionship between OBJ/Maude evaluation strategies and context-sensitive rewriting 
see [76]77]. 

The key point is that concurrency and nondeterminism cannot be directly modeled 
in an equational/functional framework, which typically assumes determinism in the 
form of a Church-Rosser property. Therefore, one needs special devices to model 
some concurrency aspects indirectly. Two good examples of indirectly modeling con- 
currency within a purely functional framework are the ACL2 semantics of the JVM 
using a scheduler [IOI], and the use of lazy data structures in Haskell to analyze 
cryptographic protocols [7]. 


A 
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be naturally unified without requiring any translations. That is, in a logic one 
can define quite different theories which have associated models. The logic then 
allows one to understand in a unified way all such models as models in the same 
logic; but there is plenty of room for diversity between them. Furthermore, once 
we understand that a logical framework of this kind can give us an enormous 
range of possibilities for naturally expressing different concurrency phenomena, 
we realize that we can have a general framework without in any way needing a 
general model, whatever that means. 

Is rewriting logic a suitable general framework in exactly this sense? The 
answer is necessarily an empirical one, and can never be claimed to be definitive. 
But the amount of positive evidence gathered up to now, thanks to the research 
of different people and covering indeed a very wide range of concurrency models, 
is in my view very strong. The key point is the naturalness and directness with 
which different concurrency models can be expressed as rewrite theories. It is 
not a matter of complicated encodings: typically the original representations of 
a model and those of its associated rewrite theory are isomorphic. Since all this 
is a matter carefully documented in many papers and in several rewriting logic 
surveys, I will not go over the, indeed quite large, body of work backing the view 
that rewriting logic is a very expressive general framework for concurrency. I refer 
the reader to the survey paper [82]; and for an explanation of how rewriting logic 
unifies and improves upon other semantic frameworks such as algebraic semantics 
and structural operational semantics (SOS) to the more recent papers [97]98]. 


3.1 Rewrite Theories: Their Execution and Formal Analysis 


A rewrite theory is a tuple R = (X, E U A, u, R, ġ), with: (1) (X,E U A, u) a 
membership equational theory with “modulo” axioms A and context-sensitive 
equational reduction strategy u; (2) R a set of labeled conditional rewrite rules 
of the general form 


r:(YX)t—ť if (A wi = w) aN o: si) A (Nui — wi) (1) 
i j l 


where the variables appearing in all terms are among those in X, terms in each 
rewrite or equation have the same kind, and in each membership vj : sj the 
term vj has kind [s;]; and (3) ¢ : X — P(N) a mapping assigning to each 
function symbol f € X (with, say, n arguments) a set ọ¢(f) = {i1,..., ik}, 
1<ii<...< ik <n of frozen argument aN under which it is forbidden 
to perform any rewrites. 

Intuitively, R specifies a concurrent system, whose states are elements of the 
initial algebra Ts/ gua specified by (X, EUA), and whose concurrent transitions 
are specified by the rules R, subject to the frozenness requirements imposed by @. 


5 In Maude, ¢(f) = {i1,..., ix} is specified by declaring f with the frozen attribute, 
followed by the string (i1 ... ix). Although originated by a quite different moti- 
vation, frozen operators have some similarities with notions such as “non-coherent 
operators” in CafeOBJ [46], and “non-congruent” operators in BOBJ [54]. 
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The frozenness information is important in practice to forbid certain rewritings. 
For example, when defining the rewriting semantics of a process calculus, one 
may wish to require that in prefix expressions a.P the operator -. is frozen 
in the second argument, that is, ¢(-.-) = {2}, so that P cannot be rewritten 
under a prefix. Note that a rewrite theory R = (X, E U A, u, ¢, R) specifies two 
kinds of context-sensitive rewriting requirements: (1) equational rewriting with 
E modulo A is made context-sensitive by u; and (2) non-equational rewriting 
with R is made context-sensitive by ¢. But the maps u and ¢ impose different 
types of context-sensitive requirements: (1) (f) specifies a list of arguments 
where we are allowed to rewrite with equations in E; and (2) $(f) specifies 
arguments where we are forbidden to rewrite with the rules R. The maps u and 
@ substantially increase the expressive power of rewriting logic, because various 
order-of-evaluation and context-sensitive requirements, which would have to be 
specified by explicit rules in a formalism like SOS, become implicit and are 
encapsulated in u and ¢. 

For execution purposes a rewrite theory R = (X, EUA, u, R, $) should satisfy 
some basic requirements that are assumed to hold for Maude system modules. 
Such modules are specifications of the form mod (X, E U A, u, R,¢) endm. First, 
in the membership equational theory (X, EU A, u), E should be ground Church- 
Rosser modulo A — for A a set of equational axioms for which matching modulo 
A is decidable — and ground terminating modulo A, up to the context-sensitive 
strategy nil Second, the rules R should be coherent with E modulo A [136]; 
intuitively, this means that, to get the effect of rewriting in equivalence classes 
modulo Æ U A, we can always first simplify a term with the equations in E to 
its canonical form modulo A, and then rewrite with a rule in R. Finally, the 
rules in R should be admissible [18], meaning that in a conditional rewrite rule 
of the form (Ij, besides the variables appearing in t there can be extra variables 
in t’, provided that they also appear in the condition and that they can all be 
incrementally instantiated by either matching a pattern in a “matching equation” 
or performing breadth first search in a rewrite condition (see [I8] for a detailed 
description of admissible equations and rules). 

Computation in Maude is then deduction with the inference rules of rewriting 
logic (see [I3]) that are efficiently implemented by the Maude engine under the 
above executability assumptions. Specifically, equivalence classes modulo Æ U A 
are represented by their unique canonical forms modulo A. That is, Maude per- 
forms equational rewriting to reach a canonical form with the equations in E 
modulo A by means of the reduce command. This is entirely analogous to OBJ’s 
reduce command for equational specifications, but applies now to more general 
theories. It also supports two variants of fair rewriting with the rules R modulo 
A which, in combination with equational rewriting and under the coherence as- 
sumption, achieves the effect of rewriting with R in (E U A)-equivalence classes. 
These two commands are the rule-fair rewrite command; and the rule and po- 


® u-termination is a weaker requirement than termination [77]; the interactions be- 
tween context-sensitive rewriting and the Church-Rosser property are somewhat 
subtle [75178]. 
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sition fair frewrite command which, for object-based systems (see Section 3.3) 
is also object and message fair. Furthermore, the context-sensitive requirements 
provided by yz and ¢ are always respected. Since the rules R need not be confluent 
and may be highly nondeterministic, the rewrite and frewrite commands give 
just one execution path among many others. This is still very useful for execution 
and simulation purposes, but for analysis purposes Maude’s search command 
supports a systematic breadth-first exploration of all rewrite paths until states 
matching a specified pattern and satisfying specified semantic conditions are 
reached. For example, we may want to know whether the concurrent system 
specified by our rewrite theory satisfies a given invariant (say, is deadlock-free). 
We can then search for a reachable state satisfying the negation of the given 
invariant. Within the practical limitations of time and memory, the search com- 
mand then gives us a semi-decision procedure for the failure of such invariants, 
regardless of the in general infinite number of reachable states of our systems. 
Furthermore, for systems whose sets of reachable states are finite, Maude also 
provides a decision procedure for the satisfaction of linear-time temporal logic 
(LTL) properties. This is achieved through its built-in MODEL-CHECKER module 
which, in the experiments that we have evaluated [4142], performs explicit-state 
on-the-fly model checking of LTL formulas with efficiency comparable to that of 
the SPIN model checker [69]. 


3.2 Module Algebra: The Power of Reflective Thinking 


One of the most powerful features of OBJ2 and OBJ3 was the possibility of 
defining parameterized modules having semantic requirements for their instan- 
tiation specified in the form of parameter theories. Such modules could then be 
instantiated by means of views (theory interpretations) in the typical pushout 
construction way of Clear [14]. They could also be renamed, and instantiations 
and renamings could be composed in very expressive module expressions (see 
47165|). This supported a very powerful discipline of parameterized program- 
ming that inspired similar mechanisms in ML and in module interconnection 
languages such as LILEANNA [135]. In hindsight, however, there were two lim- 
itations. The first was that it took in practice a long time (several years of hard 
work) to properly implement this part of the language. Indeed, it proved to 
be the most complex and sophisticated component of OBJ3’s LISP-based imple- 
mentation. The second limitation, much less apparent to us at the time, was that 
OBJ’s module algebra, while very powerful, was a closed algebra, in the sense of 
offering a fixed repertoire of theory operations. Of course, one could have imag- 
ined other operations, but this would have required both a new metatheory and 
big implementation efforts. 

An important breakthrough at the theoretical level was the formulation of a 
general axiomatic notion of reflective logic by Manuel Clavel and myself in [23], 
followed by a series of papers, a Ph.D. thesis, and a book, showing that several 
conditional and unconditional versions of rewriting logic, as well as membership 
equational logic and many-sorted Horn logic with equality, are indeed reflective 
[24] 15]16]25]26]. Intuitively, a logic is reflective if it can represent its metalevel 
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at the object level in a sound and coherent way. Specifically, rewriting logic can 
represent its own theories and their deductions by having a finitely presented 
rewrite theory U that is universal, in the sense that for any finitely presented 
rewrite theory R (including U itself) we have the following equivalence 


Rett & UE{IR, (RP, 


where R and Ẹ are terms representing R and t as data elements of U. Since U 
is representable in itself, we can achieve a “reflective tower” with an arbitrary 
number of levels of reflection. 

Reflection is a very powerful property: it allows defining rewriting strategies 
by means of metalevel theories that extend U and guide the application of the 
rules in a given object-level theory R [24683]; it is efficiently supported in the 
Maude implementation by means of descent functions [I7], implemented in the 
built-in META-LEVEL module; it can be used to build a variety of theorem proving 
and theory transformation tools [15]20/21]27|; and it can also be used to prove 
metalogical properties about families of theories in rewriting logic, and about 
other logics represented in the rewriting logic (meta-)logical framework [5[22]6]. 

From the module algebra point of view, the key advantage is that the univer- 
sal theory U, and the META-LEVEL module that implements key descent functions 
for it, have a sort Module whose terms represent finitary rewrite theories. This 
means that theories become data that can be manipulated within the logic in 
a declarative way. Similar sorts, defining data types for parameterized modules 
and for views, can likewise be easily defined in extensions of the META-LEVEL 
module. In this way, Francisco Durán and I showed that many powerful theory 
composition operations endowing Maude with a module algebra can be defined 
within the logic . Furthermore, the module algebra so defined now be- 
comes easily extensible. For example, the notion of parameterized module, and 
the way in which module instantiation can be defined does not necessarily have 
to follow a pushout-like pattern. Different forms of parameterization, understood 
as new metalevel functions, can be easily defined. For instance, it is very easy to 
define in the Full Maude extension of Maude a TUPLE(n) module that for each 
nonzero natural number n provides a parameterized module of n-tuples [32]. 
Indeed, reflection has allowed considerable flexibility in easily defining and ex- 
perimenting with different module composition operations before implementing 
some of them in the underlying Core Maude system, as has been recently done 
in Maude 2.2. Furthermore, Full Maude itself has been an excellent basis for 
building other Maude extensions such as Real-Time Maude (see Section [4.1), a 
strategy language for Maude [83], and the Maude termination tool (MTT) [36]. 

More generally, reflection has made it quite easy to build an environment of 
formal analysis tools for Maude. Such tools, by their very nature, manipulate 
and analyze rewrite theories. By reflection, a rewrite theory R becomes a term 
R in the universal theory, which can be efficiently manipulated by the descent 
functions in the META-LEVEL module. As a consequence, Maude formal tools 
have a reflective design and are built in Maude as suitable extensions of the 
META-LEVEL module. They include the following: 
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— the Maude Church-Rosser Checker, and Knuth-Bendix and Coherence Com- 
pletion tools 

— the Full Maude module composition tool 

— the Maude Predicate Abstraction tool 

— the Maude Inductive Theorem Prover (ITP) 

— the Real-Time Maude tool (discussed in Section (4.1) 

— the Maude Sufficient Completeness Checker (SCC) 

— the Maude Termination Tool (MTT) [36]. 





3.3 Object-Oriented Modules 


A declarative treatment of the object paradigm was also a key goal from the very 
beginning of rewriting logic [86], and was more fully realized as part of Maude’s 
language design in [90]. Of course, since concurrent programming was also a key 
goal, the point was to have a declarative way to specify and program concur- 
rent object systems. This declarative approach, by using subsort overloading and 
proposing a key distinction between class inheritance and module inheritance 
solved also an old chestnut in concurrent object-oriented programming, namely 
the so-called inheritance anomaly [91]. 

The essential idea is extremely simple. We view the state of a concurrent 
object system as a “soup” of objects and messages. Mathematically, such a 
soup is modeled as a multiset, built up from the objects and the messages by 
means of a multiset union operator that is associative and commutative and 
has the empty multiset as its identity element. Concurrent interactions between 
objects, and between objects and messages, are then described by means of 
rewrite rules that transform a fragment of such a soup into a new fragment. By 
rewriting logic’s congruence rule [88], many such rewrites can of course take place 
concurrently within the soup. Rules whose lefthand sides involve a single object 
and at most one message are called asynchronous and essentially correspond 
to the Actor model of computation [SE]. Rules whose lefthand sides involve 
more than one object are called synchronous, because such objects have to come 
together synchronously in order for the interaction to take place. 

More generally, the soup describing the distributed state of an object system 
needs not be “flat” but may instead be a “soup of soups” with arbitrary nesting 
depth. For example, the Internet is a network of networks and a soup of soups 
in exactly this sense. This structuring is very useful, for example for security 
and management /monitoring purposes. Carolyn Talcott and I modeled this in 
rewriting logic by means of our “Russian dolls” model of concurrent object re- 
flection [100]. The “dolls” in question are meta-objects, which may contain in 
their belly a whole soup of other (meta-) objects, and so on “all the way down.” 
In this way, all kinds of mechanisms for concurrent meta-object reflection can 
be naturally axiomatized, programmed, and reasoned about [100]. The Russian 
dolls model is also useful in clarifying the relationship between object-oriented 
reflection and logical reflection in the sense of Section [3.2] Some object-oriented 
reflection mechanisms do not need logical reflection: the hierarchical nesting 
of dolls (meta-object nesting) is enough to express them. But more powerful 
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concurrent object reflection mechanisms may use both the nesting of dolls and 
logical reflection. For example, the mobility features of Mobile Maude use 
both meta-object reflection and logical reflection. 

In Maude, concurrent object systems are specified in object-oriented mod- 
ules [903713218]. Such modules provide syntactic sugar supporting all the usual 
object-oriented concepts: objects, object attributes, messages, object classes, and 
multiple class inheritance. Furthermore, they can be parameterized with param- 
eter theories just like any other Maude module. Semantically, all this useful 
syntactic sugar can be stripped away, so that a Maude object-oriented module is 
semantically equivalent to an ordinary rewrite theory, that is, to a corresponding 
Maude system module into which it can be desugared. Operationally, however, 
knowledge of the existence of objects and messages within a multiset represent- 
ing a distributed object state is used by Maude’s frewrite command to support 
a rule, position, and object and message fair rewriting strategy. In conjunction 
with Maude 2.2’s built-in internet sockets feature [19], this provides a very simple 
and elegant way of doing declarative internet programming in Maude, because 
there is no need whatsoever for writing any complicated thread scheduling code, 
which is typically needed when a conventional language is used. 


4 Beyond Maude 


How general and expressive is rewriting logic? The best way to find out is by 
pushing its limits. What follows is a progress report on how, through several 
research collaborations, some of us have been extending rewriting logic and its 
range of applications beyond those of Maude itself so as to encompass: (i) real- 
time and hybrid systems; (ii) probabilistic systems; (iii) deduction with logical 
variables; (iv) higher-order specifications; and (v) behavioral specifications. 


4.1 Real-Time Maude 


In many reactive and distributed systems, real-time properties are essential to 
their design and correctness. Therefore, the question of how systems with real- 
time features can be best specified, analyzed, and proved correct in the semantic 
framework of rewriting logic is an important one. This question has been inves- 
tigated by several authors from two perspectives. On the one hand, an extension 
of rewriting logic called timed rewriting logic has been investigated, and has been 
applied to some examples and specification languages [71]105J125]. On the other 
hand, Peter Olveczky and I have found a simple way to express real-time and hy- 
brid system specifications directly in rewriting logic [[06J108]. Such specifications 
are called real-time rewrite theories and have rules of the form 


r:{ttj-5ft} Ce 


with 6 a term denoting the duration of the transition (where the time domain 
can be chosen to be either discrete or dense), {t} representing the whole state of 
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a system, encapsulated with {_}, and C an equational condition. Peter Olveczky 
and I have shown that, by making the clock an explicit part of the state, these 
theories can be desugared into semantically equivalent ordinary rewrite theories 
[LO06]108{109]. That is, in the desugared version we can model the state of a real- 
time or hybrid system as a pair ({t},7), with {t} the current state, and with r 
the current global clock time. Then the above rule becomes desugared as 


r: ({t},7) — ({t},7+6) ff C 


Rewrite rules can then be either instantaneous rules, that take no time and only 
change some part of the state t, or tick rules, that advance the global time of 
the system according to some time expression 6 and may also change the state 
t. When time is dense, tick rules may be nondeterministic, in the sense that the 
time 6 advanced by the rule is not uniquely determined, but is instead a para- 
metric expression (however, this time parameter is typically subjected to some 
equational condition C). In such cases, tick rules need a time sampling strategy 
to choose suitable values for time advance. Besides being able to show that a 
wide range of known real-time models (including, for example, timed automata, 
hybrid automata, timed Petri nets, and timed object-oriented systems) can be 
naturally expressed in a direct way in rewriting logic (see [108]), an important 
advantage of our approach is that one can use an existing implementation of 
rewriting logic to execute and analyze real-time specifications. Because of some 
technical subtleties, this seems difficult for the alternative of timed rewriting 
logic, although a mapping into our framework does exist [T08]. 

Real-Time Maude [102[107|109/110] is a specification language and a formal 
tool built in Maude by reflection. It provides special syntax to specify real-time 
systems, and offers a range of formal analysis capabilities. The Real-Time Maude 
2.1 tool [L09112] systematically exploits the underlying Maude efficient rewrit- 
ing, search, and LTL model checking capabilities to both execute and formally 
analyze real-time specifications. Reflection is crucially exploited in the Real-Time 
Maude 2.1 implementation. On the one hand, Real-Time Maude specifications 
are internally desugared into ordinary Maude specifications by transforming their 
meta-representations. On the other, reflection is also used for execution and anal- 
ysis purposes. The point is that the desired modes of execution and the formal 
properties to be analyzed have real-time aspects with no clear counterpart at 
the Maude level. To faithfully support these real-time aspects a reflective trans- 
formational approach is adopted: the original real-time theory and query (for 
either execution or analysis) are simultaneously transformed into a semantically 
equivalent pair of a Maude rewrite theory and a Maude query [0912]. One im- 
portant concern about the search and model checking analyses thus performed 
by Real-Time Maude is their completeness. Note that not all state-time pairs are 
visited, but only those allowed by the given time sampling strategy. For dense 
time it is even impossible to visit all times. Fortunately, under simple conditions 
on the specification, that are indeed satisfied by almost all examples that have 
been analyzed in Real-Time Maude, the analyses are indeed complete: if the tool 
finds no counterexamples, the given property holds [TI]. 
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In practice, Real-Time Maude executions and analyses are quite efficient. 
They allow scaling up to highly nontrivial specifications and case studies. In 
fact, both the naturalness of Real-Time Maude to specify large nontrivial real- 
time applications (particularly for distributed object-oriented real-time systems) 
and its effectiveness in simulating and analyzing the formal properties of such 
systems have been demonstrated in a number of substantial case studies, includ- 
ing: (1) the AER/NCA suite of active network protocols [102]104{113)]; (2) the 
NORM multicast protocol [74]; (3) the OGDC wireless sensor network algorithm 
[134]114]; and (4) the CASH adaptive scheduling algorithm [103]. Real-Time 
Maude is freely available from It is a 
mature and quite efficient tool, and its source code, a tool manual, examples, 
case studies, and papers are all available in its web page. 


4.2 PMaude and SHY Maude 


Many systems are probabilistic in nature. This can be due either to the uncer- 
tainty of the environment in which they must operate, such as message losses 
and other failures in an unreliable environment, or to the probabilistic nature of 
some of their algorithms, or to both. In general, particularly for distributed sys- 
tems, both probabilistic and nondeterministic aspects may coexist, in the sense 
that different transitions may take place nondeterministically, but the outcomes 
of some of those transitions may be probabilistic in nature. To specify systems of 
this kind, rewrite theories have been generalized to probabilistic rewrite theories 
in [72/734]. Rules in such theories are probabilistic rewrite rules of the form 


r:t(x)—t(x,y) if C(x) with probability y := 7,(ax) 


where the first thing to observe is that the term t’ has new variables y disjoint 
from the variables x appearing in t. Therefore, such a rule is nondeterministic; 
that is, the fact that we have a matching substitution 0 for the variables x such 
that 0(C’) holds does not uniquely determine the next state fragment: there can 
be many different choices for the next state, depending on how we instantiate 
the extra variables y in t’. In fact, we can denote the different such next states 
by expressions of the form t’(@(a), p(y)), where @ is fixed as the given match- 
ing substitution, but p ranges along all the possible substitutions for the new 
variables y. The probabilistic nature of the rule is expressed by the notation: 
with probability y := 1,(#), where 7,(a) is a probability distribution which 
may depend on the matching substitution 0. We then choose the values for y, that 
is, the substitution p, probabilistically according to the distribution 7,(0(a)). 
The fact that the probability distribution may depend on the substitution 0 
can be illustrated by means of a simple example. Consider a battery-operated 
clock. We may represent the state of the clock as a term clock(T,C), with Ta 
natural number denoting the time, and C a positive real denoting the amount 
of battery charge. Each time the clock ticks, the time is increased by one unit, 
and the battery charge slightly decreases; however, the lower the battery charge, 
the greater the chance that the clock will stop, going into a state of the form 
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broken(T,C’). We can model this system by means of the probabilistic rewrite 
rule 


rl [tick]: clock(T,C) => if B then clock(s(T),C - (C / 1000)) 
else broken(T,C - (C / 1000)) 
fi 
with probability B := BERNOULLI(C / 1000) 


that is, the probability of the clock breaking down instead of ticking normally de- 
pends on the battery charge, which is here represented by the battery-dependent 
bias of the coin in a Bernoulli trial. Note that here the new variable on the rule’s 
righthand side is the Boolean variable B, corresponding to the result of toss- 
ing the biased coin. As shown in [72], probabilistic rewrite theories can express a 
wide range of models of probabilistic systems, including continuous-time Markov 
chains [131], probabilistic non-deterministic systems [T19[123], and generalized 
semi-Markov processes [48]; they can also naturally express probabilistic object- 
based distributed systems [73]4], including real-time ones. Yet another class of 
probabilistic models that can be simulated by probabilistic rewrite theories is 
the class of object-based stochastic hybrid systems discussed in [99]. 

The PMaude language [73]4] is an experimental specification language whose 
modules are probabilistic rewrite theories. Note that, due to their nondetermin- 
ism, probabilistic rewrite rules are not directly executable. However, probabilistic 
systems specified in PMaude can be simulated in Maude. As explained in [4]93}, 
this is accomplished by transforming a PMaude specification into a correspond- 
ing Maude specification in which actual values for the new variables appearing 
in the righthand side of a probabilistic rewrite rule are obtained by sampling 
the corresponding probability distribution functions using standard techniques 
based on random number generation and Maude’s built-in COUNTER and RANDOM 
modules. 

In general, provided that sampling for the probability distributions used in a 
PMaude module is supported in the underlying infrastructure, we can associate 
to it a corresponding Maude module. We can then use this associated Maude 
module to perform Monte Carlo simulations of the probabilistic systems thus 
specified. As explained in [4], provided all nondeterminism has been eliminated 
from the original PMaude moduld?, we can then use the results of such Monte 
Carlo simulations to perform a statistical model checking analysis of the given 
system to verify certain properties. For example, for a PMaude specification of a 


7T The point is that, as explained above, in general, given a probabilistic rewrite theory 
and a term t describing a given state, there can be several different rewrites, perhaps 
with different rules, at different positions, and with different matching substitutions, 
that can be applied to t. Therefore, the choice of rule, position, and substitution is 
nondeterministic. To eliminate all nondeterminism, at most one rule at exactly one 
position and with a unique substitution should be applicable to any term t. As ex- 
plained in [4], for many systems, including probabilistic real-time object-oriented sys- 
tems, this can be naturally achieved, essentially by scheduling events at real-valued 
times that are all different, because we sample a continuous probability distribution 
on the real numbers. 
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TCP/IP protocol variant that is resistant to Denial of Service (DoS) attacks, we 
may wish to establish that, even if an attacker controls 90% of the network band- 
width, it is still possible for the protocol to establish a connection in less than 
30 seconds with 99% probability. Properties of this kind, including properties 
that measure quantitative aspects of a system, can be expressed in the QATEX 
probabilistic temporal logic [4], and can be model checked using the VeStA tool 
[124]. See [2] for a substantial case study specifying a DoS-resistant TCP/IP 
protocol as a PMaude module, performing Monte Carlo simulations by means of 
its associated Maude module, and formally analyzing in VeStA its properties, ex- 
pressed as QATEX specifications, according to the methodology just described. 
More recently, several object-based stochastic hybrid system case studies have 
been specified in an extension of both PMaude and Real-Time Maude called 
SHY Maude and have been simulated in Maude. Relevant formal properties 
for each case study, expressed as QATEX specifications, have been statistically 
model checked in VeStA using Monte Carlo simulations performed in Maude 


[99]. 


4.3 Narrowing: Eqlog Revisited 


Narrowing is a symbolic procedure like rewriting, except that rules, instead 
of being applied by matching a subterm, are applied by unifying the lefthand 
side with a nonvariable subterm. Traditionally, narrowing has been used as a 
method to solve equations in a confluent and terminating equational theory. In 
rewriting logic, Prasanna Thati and I have generalized narrowing to a procedure 
for symbolic reachability analysis [132]. That is, instead of solving equational 
goals da. t = t, we solve reachability goals Ja. t — t’, stating that there 
is an instance of t from which we can reach by rewriting with rules R modulo 
equations E an instance of t’. 

For arbitrary rewrite theories narrowing, though sound, is not a complete 
procedure [132]. However, for large classes of theories of interest, including theo- 
ries specifying distributed object systems, narrowing is complete and provides a 
complete semidecision procedure for solving reachability problems [132]. Further 
recent work on narrowing with rewrite theories focuses on: (1) generalizing the 
procedure to so-called “back-and-forth narrowing,” so as to ensure completeness 
under very general assumptions about the rewrite theory R [133]; and (2) effi- 
cient lazy strategies to restrict as much as possible the narrowing search space 
45]. 

Narrowing with rewrite theories has important applications to the analysis of 
cryptographic protocols. A relevant point is that, since narrowing with a rewrite 
theory R = (X, E, R) is performed modulo the equations E, this allows more 
sophisticated analyses than those performed under the usual Dolev-Yao “perfect 
cryptography assumption.” It is well-known that protocols that had been proved 
secure under this assumption can be broken if an attacker uses knowledge of 
the algebraic properties satisfied by the underlying cryptographic functions. In 
rewriting logic we can specify a cryptographic protocol with a type of rewrite 
theory R = (X, E, R) for which narrowing is complete, and can model those 
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algebraic properties as equations in Æ. Very recent work in this direction by 
Escobar, Meadows and myself uses rewriting logic and narrowing to give 
a precise rewriting semantics to the inference system of one of the most effective 
analysis tools for cryptographic protocols, namely the NRL Protocol Analyzer 


[84]. 

Equational narrowing is a special case of rewriting logic narrowing, namely 
the case where we solve reachability goals of the form Jæ. equal(t, t’) — true 
using the equations E as rewrite rules and adding the extra rule equal(x, x) —> 
true. Furthermore, Horn logic with equality can be conservatively embedded in 
rewriting logic [89/81]. Indeed, in this embedding narrowing with the resulting 
rewrite theory is complete and agrees with SLD resolution modulo the equations 
E. This means that we reencounter our old friend Eqlog within the broader 
perspective of rewriting logic narrowing. 





4.4 The Open Calculus of Constructions 


Rewriting logic is an expressive logical framework, in which many other logics 
can be naturally represented [81]. Furthermore, by exploiting its reflective fea- 
tures in conjunction with the inductive nature of initial models, it has also good 
properties as a meta-logical framework, so that we can not only represent logics, 
but can also reason within the framework about their meta-logical properties 
BIG]. 

But how good and general is it anyway? For example, how does it compare 
with the higher-order type theory formalisms that have been proposed by dif- 
ferent authors as logical frameworks? Mark-Oliver Stehr and I tried to give an 
answer to this question using transitivity of representation mappings. If we could 
show that a higher-order type theory can be easily and naturally represented in 
rewriting logic in a conservative way, then any representation of a logic into such 
a type theory would automatically yield one in rewriting logic by composition. 
This would not be the simplest representation of that logic that one could define 
directly in rewriting logic, but it would prove that anything one can represent in 
the higher-order framework can likewise be represented in rewriting logic. Even 
so, some people might still be skeptical. Maybe you did it for Martin-Löf type 
theory, but how do I know that you can also do it for the Calculus of Construc- 
tions? All this could be dragged ad nauseam. So, what Mark-Oliver and I did in 
[130] was to specify a single parametric map (using parameterization in Maude) 
faithfully representing pure type systems (PTS) into rewriting logic. Since 
pure type systems encompass a large class of type theories with simple types, 
type parameters, and type families, including the lambda cube, our skeptical 
colleagues would now have to come up with more exotic type theories outside 
the PTS general fold. At the meta-logical framework level, a careful comparison 
with higher-order type theories used for that purpose was given by David Basin, 
Manuel Clavel and myself in [6]. 

In fact, Mark-Oliver and I defined in [130] several representation mappings for 
pure type systems at different levels of abstraction. The more abstract, textbook- 
like representation mapped isomorphically the textbook syntax of pure type sys- 
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tems. But in order to give a more computational representation that would take 
care automatically of all the binding and substitution paraphernalia, we also 
gave a more concrete representation using Mark-Oliver’s CINNI calculus of ex- 
plicit substitutions [126] and showed it equivalent to the textbook one. Similarly, 
typing inference systems were represented in Maude in a computational way by 
means of rewrite rules [130]. This more concrete representation map was used 
by Mark-Oliver in his thesis [127] to implement in Maude his Open Calculus 
of Constructions (OCC) . Since the Coquand-Huet calculus of con- 
structions (CC) [28] is one of the instances of pure type systems, one could of 
course obtain an implementation of CC in Maude that way. But Mark-Oliver 
went considerably further. One of the sore points with higher-order type theories 
is their very limited and awkward way of dealing with equalities: an equational 
reasoning system like Maude can perform millions of equational deduction steps 
automatically in a second; but to represent such deduction steps within a given 
constructive type theory one needs to justify each of those equality steps con- 
structively. By generating proof objects for the deductions of an external tool, 
for example for membership equational logic deduction [IZI], one can partly 
get around the problem. But Mark-Oliver’s solution was more radical. By drop- 
ping the constructive interpretation and allowing simple set-theoretic models for 
OCC, he solved this problem directly: equality steps are allowed inside OCC, 
even modulo axioms like associativity and commutativity. Furthermore, OCC 
distinguishes several notations for equality, making clear whether they can be 
handled automatically by equational simplification, or need to be performed by 
explicit deduction steps. Likewise, a notation for relations representing rewrite 
rules in the rewriting logic sense is also provided. All this means that OCC can 
be viewed as a natural conceptual unification of the Calculus of Constructions 
and of rewriting logic. In particular, Maude can be naturally regarded as a sub- 
language of OCC. As shown in [127[128]129], all the nice reasoning capabilities 
of the Calculus of Constructions, including its extensions with inductive and 
co-inductive principles, can be represented in OCC, that can carry out highly 


nontrivial proof tasks [[27]128]129]. 


4.5 BMaude 


In some sense, Maude, and languages like CafeOBJ [46] and BOBJ [54] that 
support hidden logic and behavioral equivalence, push the envelope in differ- 
ent directions of the specification language design space. Yet, there is a natural 
question about how these languages are all related. For example, both Maude 
and those languages have equational logic sublanguages. CafeOBJ itself provided 
some answer to this question in the form of the CafeOBJ “cube” of institutions 
[46], in which equational logic, hidden equational logic, and rewriting logic are 
related and unified. But the unification of rewriting logic and hidden logic pro- 
posed in [29] and used in [46]31] has some limitations regarding its model theory, 
and the matter seems to deserve further research. 

While leaving open the issue of whether a more satisfactory unification of 
hidden logic and rewriting logic can be found, what Grigore Roşu and I did 
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in [96] was to develop a hidden/behavioral extension of membership equational 
logic called behavioral membership equational logic. We were interested in this 
extension because of theoretical and practical reasons. Theoretically, the greater 
generality and expressiveness of MEL over, say, order-sorted equational logic re- 
sulted in a more expressive behavioral logic. Practically, the reflective features of 
Maude make it easy to develop an extension of Maude called BMaude in which 
theories in behavioral membership equational logic can be specified as modules, 
and to support deduction in such modules by behavioral rewriting [120[122). 
Work ahead in this direction includes passing from the present theoretical foun- 
dations and BMaude language design to a prototype implementation, and finding 
a more general behavioral extension of rewriting logic itself. 


5 Conclusions 


Science is a dialogue. This gets somewhat distorted by the unidirectional charac- 
ter of publications, including this one; and by the impossibility of making always 
explicit the many influences shaping our ideas. This festive occasion provides an 
opportunity for reflecting, with gratitude, on such influences; and for looking in 
hindsight at the road already traveled, and forward to the ways ahead. I have 
tried to do a little of all this from a limited and subjective perspective, but one 
that I am at least very familiar with: some of the ways in which the OBJ, Eqlog, 
and FOOPS ideas have influenced Maude. And some of the directions in which 
the current Maude ideas are expanding. 

One way to wrap all this up is with a picture describing the relationship 
between the different languages I have been discussing. I call it a language ge- 
nealogy. Solid lines describe language inclusions (or near inclusions). Dashed 
lines describe a weaker relationship, namely one of influence between different 
languages. Not all influences are reflected in the picture: to avoid too much clut- 
tering, only those that I think are more direct are depicted. One point to bear in 
mind is that some of these languages are currently under construction, or even 
in their design phase. For example, only a first prototype of PMaude exists at 
present, and BMaude and SHYMaude are only language designs at this point. 
Acknowledgments. In this paper I have reflected on some of the ways in 
which Joseph’s ideas have influenced mine. But there are many others, both 
scientific and nonscientific: so much so that an actual enumeration would be 
both impossible and futile. It is with deep gratitude that I wish to thank Joseph, 
not only for his ideas and his example, but above all for his friendship. I have 
already mentioned by name all the colleagues who were involved in the OBJ1-3 
collaborations. To all of them I also extend my sincere thanks. 
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time rewrite theories is joint work with Peter Olveczky at the University of Oslo; 
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Abstract. Goguen and Malcolm specify semantics of programming lan- 
guages in OBJ. Here, we consider how the extensibility and reusability 
of their specifications could be improved. We propose using the notation 
and modular structure of the Constructive Action Semantics framework 
in OBJ, and give a simple illustration. The reader is assumed to be fa- 
miliar with OBJ. 


1 Introduction 


Conventional semantic descriptions of programming languages suffer from poor 
modularity. In denotational semantics, for instance, descriptions are usually di- 
vided into three sections, defining (abstract) syntax, semantic entities, and se- 
mantic functions. The semantic functions map parts of programs composition- 
ally to their denotations (which are themselves usually functions) and are de- 
fined inductively by so-called semantic equations. All the definitions have global 
visibility throughout the description of a particular language. Moreover, when 
developing a denotational semantics, adding a new construct to the syntax of the 
described language may require extensive reformulation of the definition of the 
semantic functions. The need for reformulation can be largely eliminated using 
monadic notation instead of pure A-notation, but there are still no named mod- 
ules that could be reused or extended in semantic descriptions of other languages. 
Similarly, conventional structural operational semantics lacks explicit modules, 
and adding a new construct may require reformulating all the previously-defined 
rules to take account of a new component of configurations (although the latter 
problem can be eliminated rather easily using MSOS [12]). 

Goguen and Malcolm [3] specify semantics of programming languages us- 
ing OBJ [4]. Their descriptions are a hybrid of denotational, operational, and 
algebraic semantics. Importantly, the OBJ system supports validation of the 
semantic description by running programs and proving properties about them. 
The introduction of named modules with restrictions on the visibility of their 
definitions helps to identify which parts could be affected when a definition 
is changed. However, just as with conventional denotational and operational 
semantics, adding a new construct to a language may still require extensive 
reformulation of the description of the original constructs. Their modules are 
also quite large, and unlikely to be reused directly in descriptions of different 
languages. These points will be discussed further and illustrated in Sect. 2] 
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We propose to improve the extensibility of semantics in OBJ by introduc- 
ing actions. Actions are used as the denotations of programming constructs in 
Action Semantics [8/914], and expressed using Action Notation (AN). This no- 
tation plays the same role in action semantic descriptions as \-notation does in 
conventional denotational semantics, but also provides primitives and combina- 
tors to specify control and data flow, scopes of bindings, effects on storage, and 
interactive processes. The design of AN is such that previous specifications of de- 
notations never need reformulation when adding a new construct to the described 
language. For instance, the specification of the action semantics of an arithmetic 
expression can remain the same, regardless of whether the sub-expressions might 
have side-effects, raise exceptions, spawn processes, be non-deterministic, or di- 
verge. Further details and illustrations will be given in Sect. [B] 


Using AN dramatically improves the reusability of parts of semantic descrip- 
tions. For instance, the semantic equations defining the denotations of arithmetic 
expressions could now be reused verbatim in descriptions of different languages. 
However, specification reuse by copying has major disadvantages: it leaves no 
trace of the origin of the specification, and it is not apparent whether the copied 
specification has been subsequently edited. Verbatim reuse should be made ex- 
plicit by referring directly to the original specification. 


To maximize the possibility of verbatim reuse, we propose two further changes 
to the style of specifying semantics in OBJ. Both are rather radical, and form 
the basis for a novel approach to developing semantic descriptions called Con- 
structive Semantics [13]. The first change concerns modular structure, where we 
intend to use a separate module for the description of each individual language 
construct. Such a module contains a single semantic equation, specifying the 
action semantics of the construct concerned using AN to combine the action 
semantics of its components. The second change is to map the concrete syntax 
of each language construct to the constructs of a language-independent abstract 
syntax. The semantics of each concrete construct is then derived by composing 
this map with the semantics of the abstract construct. Complex concrete con- 
structs can be mapped to combinations of simpler abstract constructs; this is 
similar to reducing a language to its kernel constructs, except that we do not 
insist on the abstract constructs being themselves directly expressible in the 
concrete language, nor that the abstract constructs are themselves irreducible. 
Specification of constructive semantics in OBJ will be illustrated in Sect. 


The Constructive Action Semantics framework was originally developed in 
collaboration with Doh [I], and further enhanced in collaboration with van den 
Brand and Iversen [15]. A constructive action semantics for Core ML has been 
specified together with Iversen [6], and used for semantics-based compiler gener- 
ation [5]. A constructive version of MSOS [12] has been used in connection with 
teaching operational semantics [10]. The general architecture of constructive se- 
mantics is advocated as a useful paradigm for the development of any kind of 
truly modular semantics [1113]. 


The present paper is based on tentative experiments using OBJ. It includes 
excerpts from the full specifications developed by the author, which are available 
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for downloading at http://www.cs.swan.ac.uk/~ cspdm/Goguen-F'S/, Please note 


that the author is not an expert user of OBJ; suggestions for improvements to 
the use of OBJ in the specifications are welcome. 


2 Algebraic Semantics in OBJ 


We shall start by recalling how Goguen and Malcolm specify the semantics of 
imperative programming languages algebraically in OBJ [3]. We shall then assess 
the extensibility and reusability of such semantic descriptions. 


2.1 A Simple Example 


Goguen and Malcolm introduce a small language called Simple, and specify both 
its syntax and semantics in OBJ. Some excerpts from their specification are given 
below (the full OBJ code is available at http: //www.cs.ucsd.edu/users/goguen/ 
sys/codel.html). 

Goguen and Malcolm start by specifying the data types of Simple. For tech- 
nical reasons (and to facilitate proofs), they specify a module ZZ which enriches 
the built-in module INT with the operation _is_ : Int Int -> Bool, and with 
various equations and conditional equations concerning integer operations and 
relations. 

OBJ allows rather general mixfix operation symbols. Goguen and Malcolm 
exploit this to declare abstract syntax constructors that correspond to a grammar 
for concrete syntax, so that concrete programs can be parsed as terms by OBJ: 


obj EXP is pr ZZ. 
dfn Var is QID . 
sorts Exp Arvar Arcomp . 
subsorts Var Int Arcomp < Exp . 


ops abc: -> Arvar . 

op _+_ : Exp Exp -> Exp [prec 10] 
op _*_ : Exp Exp -> Exp [prec 8] 
op -_ : Exp -> Exp [prec 1] 

op : Exp Exp -> Exp [prec 10] 


op _[_] : Arvar Exp -> Arcomp [prec 1] 
endo 


Goguen and Malcolm specify stores in terms of their relationship to expres- 
sions, Boolean-valued tests, and assignment statements: 


th STORE is pr BPGM . 
pr ARRAY . 
sort Store . 
op initial : -> Store . 
op _L[[_]] : Store Exp -> Int [prec 65] 
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op  _;_ : Store BPgm -> Store [prec 60] 
var S : Store . 
vars X1 X2 : Var . 


eq S [[E1 + E2J] = (S[[E1]]) + (S[[E2]]) 
eq S$; Xi :=E1 A = S 
cq S ; X1 := E1 [[X2]] = S [[X2]] if X1 =/= x2. 
eq S ; X1 := E1 [[AV]] = S [[AV]] 
Bade 


Goguen and Malcolm’s specification of the semantics of structured programs 
in Simple is formulated as equations between store terms, but the equations can 
easily be understood operationally: 


obj SEM is pr PGM . 
pr STORE . 
sort EStore . 
subsort Store < EStore . 
op _;_ : EStore Pgm -> EStore [prec 60] 
var S : Store . 
var T : Tst . 
var P1 P2 : Pgm . 


eq S; skip = S. 

eq S; (P1 ; P2) = (S ; P1) ; P2. 

cq S ; if T then Pi else P2 fi = S ; Pl 
if S[[T]] 

cq S ; if T then P1 else P2 fi = S ; P2 


if not(S[[T]]) 
cq S ; while T do Pi od 
if S[[T]] 
cq S ; while T do Pi od 
if not(S[[T]]) 
endo 


(S ; P1) ; while T do P1 od 


S 


(The supersort EStore is introduced because of the possibility of non-terminating 
while-programs. ) 


2.2 Extensibility 


Goguen and Malcolm’s semantics of Simple is algebraic (in the sense that it 
is specified as an initial algebra, using algebraic axioms). It also appears to be 
reasonably modular. But how extensible is it? Can we expect to be able to keep 
it largely unchanged when adding new constructs to the described language? 

Inspection of the main modules STORE and SEM reveals that their formulation 
depends on two assumptions: 
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— expressions do not have side-effects, and 
— the store is the only auxiliary information processed by expressions and 
structured programs. 


If we were to extend Simple by adding side-effects to expressions, representing 
local bindings by environments, allowing expressions to raise exceptions, or many 
other language features, we would violate one or both of these assumptions. 

The need to reformulate large parts of semantic descriptions when adding 
new features to language constructs is familiar from conventional denotational 
and operational semantics. In the next section, we shall see how the use of 
the action notation provided by action semantics can avoid the need for such 
reformulation, and ensure better extensibility. 


2.3 Reusability 


To what extent could parts of Goguen and Malcolm’s semantics of Simple be 
reused in descriptions of different languages? For instance, suppose we were to 
describe the semantics of a corresponding sublanguage of C (with expressions 
restricted to have no side-effects, etc.); which modules would we be able to reuse 
verbatim? 

Clearly, the modules specifying the data types of a programming language 
are highly reusable. For example, the module ZZ that specifies (an enrichment of) 
the usual integers would probably be appropriate in the semantic descriptions 
of most programming languages; any further operations required could easily be 
added after importing it. 

However, several aspects of the other modules significantly hinder their reuse: 


— The notation for expressions and structured programs is intended to reflect 
their concrete syntax. OBJ allows notation to be changed when importing 
a module, but when widespread changes would be needed (e.g., when going 
from the syntax for structured programs in Simple to that in C) it would 
surely be simpler and more perspicuous to copy and edit the original modules 
than to import them and specify the renaming of operations. 

— The module hierarchy is relatively deep. If a module such as EXP in Goguen 
and Malcolm’s example semantics were to be reused by importing and en- 
riching it, any module that imports EXP would require a corresponding en- 
richment — unless it was copied and edited to refer to the module importing 
and enriching EXP. 

— Particular sets of constructs are bundled together in the same module. OBJ 
does not allow operations to be hidden, so for instance when the module 
EXP is imported, the concrete syntax for array variables (Arvar) and array 
components (Arcomp) is included, whether one wants it or not. It appears 
that copying and editing is the only way of removing declared operations. 


In Sect. [4] we shall see how to remove all the above hindrances to reuse. 
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3 Action Semantics in OBJ 


Action Semantics is a hybrid of denotational and operational semantics. As 
usual in denotational semantics, semantic functions map programs and their 
components to denotations that represent their contribution to overall program 
behaviour. The semantic functions are compositional (i.e., the denotation of a 
construct depends only on the denotations of its components) and defined induc- 
tively by semantic equations. The main difference between action semantics and 
conventional denotational semantics is that denotations in action semantics are 
so-called actions, rather than higher-order functions on domains. The notation 
used to express actions, called simply Action Notation (AN), is itself defined op- 
erationally, in contrast to the -notation used in denotational semantics, which 
has a pure mathematical interpretation. When performed, actions may be given 
and compute data, refer to bindings, inspect and update storage; they may ter- 
minate normally, terminate exceptionally, fail, or never terminate. As we shall 
see below, AN is quite expressive, and provides primitives and combinators for 
specifying data and control flow, scopes of bindings, and effects on storagel] 


3.1 Data 


The items of data processed by actions consist of the usual data types of pro- 
gramming languages (numbers, arrays, etc.) together with identifiers, environ- 
ments (representing bindings), storage cells (locations), and entities representing 
various kinds of procedural and data abstraction (such as packages and classes). 
Actions are given and may return arbitrary finite sequences of such data. The 
constructors for such sequences can be declared in OBJ as follows: 


obj DATA is 
sorts Data Datum . 
subsorts Datum < Data . 


op no-data : -> Data . 
op _ , _ : Data Data -> Data [assoc id: no-data] 
endo 


All other operations on Data are represented by constants of sort Op: 


obj DATA/OP is 


ex DATA . 

sort Op . 

op _ ! _: Op Data -> Data . *** result of application 
op _ ? _ : Op Data -> Bool . *** definedness of result 
endo 


Note that also subsorts of Data are represented by constants (written in low- 
ercase, e.g. datum for the subsort Datum), and the corresponding retracts are 
represented by applying the operation ‘the’ to them (e.g. ‘the datum’). 


1 AN also provides primitives for asynchronous threads and message-passing, but these 
are omitted here. 
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3.2 Kernel AN 
The primitives and combinators of the kernel of AN are declared below: 


obj KERNEL-AN is 
pr DATA DATA/BOOL DATA/INT DATA/SEQ DATA/BINDINGS DATA/STORE . 
sorts Action Atomic-Action . 


subsorts Atomic-Action < Action . 

op copy : -> Atomic-Action . 
op result _ : Data -> Atomic-Action . 
op skip : -> Atomic-Action . 
eq skip = result no-data . 

op give _ : Op -> Atomic-Action . 
op _ then _ : Action Action -> Action [assoc] 

op _ and _ : Action Action -> Action [assoc] 

op _ and-then _ : Action Action -> Action [assoc] 

op indivisibly _ : Action -> Action . 


The difference between then and and-then is that in A1 then A2, the data com- 
puted by A1 is passed to A2, whereas in A1 and-then A2, the data computed by 
A1 is concatenated with that computed by A2. The difference between and-then 
and and is that the former insists on sequential execution, whereas the latter 
leaves the order unspecified, allowing interleaving. 


op throw : -> Atomic-Action . 
op thrown _ : Data -> Atomic-Action . 
op err ; -> Atomic-Action . 
eq err = thrown no-data . 

op _ catch _ : Action Action -> Action [assoc] 

op _ and-catch _ : Action Action -> Action [assoc] 


The above notation is used for actions that can terminate exceptionally, throwing 
data. Note that when the given data is not in the domain of definition of an 
operation 0, the outcome of give 0 is the same as that of err. 


op fail ; -> Atomic-Action . 
op check _ : Op -> Atomic-Action . 
op _ else _ : Action Action -> Action [assoc] 


Explicit failure of an action is distinguished from throwing an exception, and 
else allows combination of alternative actions to recover from failure. 


op unfold : -> Action . 
op unfolding _ : Action -> Action . 


The above notation is used to express iteration. 


op copy-bindings : -> Atomic-Action . 
op _ scope _ : Action Action -> Action [assoc] 
op recursively _ : Action -> Action . 
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In A1 scope A2 above, the bindings computed by A1 are the current bindings 
for A2. In recursively A the scope of the bindings computed by A includes A 
itself. 


op create : -> Atomic-Action . 
op inspect -> Atomic-Action . 
op update : -> Atomic-Action . 


The above notation is used for actions concerned with stored data. 
op enact : -> Atomic-Action . 


When the action enact is given an action as data, it performs that action. Action 
is a subsort of Datum, and (constants corresponding to) action combinators are 
included Op. Using the combinator scope allows the current bindings to be in- 
corporated in actions before they are enacted, which supports both static and 
dynamic bindings. 

The following is an excerpt from the OBJ specification of the operational 
semantics of kernel AN. It was developed primarily to support validation the 
action semantics of Simple using OBJ. An action has to be supplied with data 
and bindings, as well as access to the store, before it can be performed. The 
outcome of the execution is computed data, thrown data, or failure, together 
with the updated store. 


obt a bs j= : Action Data Bindings Store -> Action . 
Opt. F s, < : Action Data Bindings -> Action . 
op {_}_ : Action Store -> Action . 
op {ue : Action Data -> Action . 
op {_}_ : Action Bindings -> Action . 
vars A A1 A2 : Action . 

vars D» Dix D2* : Data . 

vars 0 : Op . 

vars BS BS1 BS2 : Bindings . 

vars S : Store . 


eq {copy}D* BS S = {result D*}S . 
eq {result D1i*}D* BS S = {result D1*}S . 


eq {give O}D* BS S = 
if 0 ? D* then {result (0 ! D*)}S else ferr}S fi . 


eq {A1 then A2}D* BS S = {A1}D* BS S then {A2}BS . 
eq {result D1i*}S then {A2}BS = {A2}D1* BSS . 

eq {thrown D1i*}S then {A2}BS = {thrown D1*}S . 

eq {fail}S then {A2}BS = {fail}s . 
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3.3 Full AN 


The primitives and combinators of the full AN are declared below. Note that, 
in contrast to the original version of AN, so-called yielders are not part of the 
kernel. 

obj AN is 

pr KERNEL-AN . 


sort Yielder . 
subsorts Data Op < Yielder . 


op _, _ : Yielder Yielder -> Yielder [assoc] 
op _ _ : Op Yielder -> Yielder . 
op ___ : Yielder Op Yielder -> Yielder . 
op give _ : Yielder -> Action . 
op _ _ : Atomic-Action Yielder -> Action . 


Yielders allow compositions of data operations to be applied to the given data. 
When Y is a yielder, the action give Y gives the result of Y. The action AA Y 
makes the result of Y the data for the (atomic) action AA. For example, if the 
action update is used alone, it has to be given a cell and a storable value as data, 
whereas the action update(the cell, 0) is equivalent to (give the cell and 
result 0) then update, and is given only a cell. 


op check _ : Yielder -> Action . 
op maybe _ : Action -> Action . 


The action check Y merely tests whether the value is true (failing otherwise). 
When A terminates exceptionally, maybe A fails (so maybe give O fails is the 
given data is not in the domain of 0). 


op furthermore : Action -> Action . 


op _ before : Action Action -> Action . 


The action furthermore A lets the bindings computed by A override the current 
bindings, so that (furthermore A1) scope A2 corresponds to a block. The ac- 
tion A1 before A2 combines the bindings computed by A1 and A2, letting the 
scope of the former bindings include A2. 


op bind : -> Atomic-Action . 
op current-bindings : -> Yielder . 
op closure _ : Yielder -> Yielder . 


The action bind merely computes a binding from the identifier and bindable 
value given to it. The yielder closure A results in an action which, when en- 
acted, performs A in the scope of the bindings that were current for the yielder. 


op stored-at : Yielder -> Yielder . 


op bound-to : Yielder -> Yielder . 


The yielders bound-to Y and stored-at Y refer to components of the current 
bindings and of the store, respectively. 
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The following equations illustrate how the expansion of full AN to kernel AN 
is specified in OBJ: 


eq give D* = result D* . 

eq give (Y1, Y2) = give Y1 and give Y2 . 

eq give (0 Y) = give Y then give 0 . 

eq give (Y1 0 Y2) = (give Y1 and give Y2) then give 0 . 

eq give current-bindings = copy-bindings . 

eq give bound-to Y = (copy-bindings and give Y) then give bound . 
eq give stored-at Y = give Y then inspect . 


eq AA Y = give Y then AA . 
eq maybe A = A catch fail . 
eq furthermore A = copy-bindings then give overriding . 


eq Al before A2 = 
(copy-bindings and A1) then 
(give #2 and (give overriding scope A2)) then 
give overriding . 


eq bind = give binding . 


eq closure Y = (result current-bindings) scope Y . 


3.4 Action Semantics 


The action semantics of Simple’s concrete expressions and structured programs 
could be specified as follows. 

First, a semantic function is declared for each sort of concrete syntactic con- 
struct, e.g.: 


op evaluate _ : Exp -> Action . 


(It would be appropriate to specify what sorts of data the action denotations of 
the different sorts of construct may return, but the required notation for subsorts 
of Action has not yet been specified in OBJ.) 

After importing AN and the relevant specifications of data types, the seman- 
tic functions are defined by semantic equations, e.g.: 


eq evaluate (E1 + E2) = 
(evaluate E1 and evaluate E2) then give plus . 


The action combinator A1 and A2 corresponds to so-called target-tupling of 
functions: it performs A1 and A2 (in an unspecified order) and if they both 
terminate normally, it concatenates the data that they computed. In contrast, 
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A1 then A2 corresponds to functional composition: it performs A1 first, and if 
that terminates normally, it gives any data computed by A1 to the performance 
of A2. There is also a combinator written A1 and-then A2, which is the same 
as A1 and A2 regarding data flow, but insists on sequential performance of the 
sub-actions. For any data operation 0, the primitive action give 0 applies 0 to 
its data to compute a result (terminating exceptionally when the data is not in 
the domain of definition of the operation). 

Apart from their use of common notation for data, actions, and semantic 
functions, the semantic equations for the various constructs are completely inde- 
pendent. For instance, the formulation of the semantic equations for arithmetic 
expressions does not depend at all on whether expression evaluation might have 
side-effects, raise exceptions, never terminate, etc. The only crucial feature of 
expression evaluation is that if it terminates normally, it returns a single data 
item. 

Thanks to the independence provided by the use of AN, the semantic equa- 
tions for different constructs never need reformulation when the constructs are 
combined in the same language. Thus the semantic equation for a particular con- 
crete construct can be the same in different languages, which promises a high 
degree of reusability. 

However, recall the hindrances to explicit reuse mentioned in Sect. [2.3} the 
dependence on concrete syntax, the relatively deep module hierarchy, and the 
bundling of constructs together in single modules. The next section shows how 
these hindrances can be removed in OBJ. In conjunction with the use of ac- 
tion semantics as described above, this leads to extreme reusabiity of parts of 
semantic descriptions in OBJ. 


4 Constructive Semantics in OBJ 


As outlined in the introduction, constructive semantics involves two main de- 
partures from the conventional style of semantic description: 


— concrete language constructs are mapped to language-independent abstract 
constructs, and 
— the semantics of each abstract construct is specified in a separate module. 


Together, the above features allow the development of a repository containing se- 
mantic descriptions of individual abstract constructs, as well as the efficient reuse 
of these descriptions in connection with the semantics of concrete languages. 

Below, we shall illustrate the ideas of constructive semantics by showing 
excerpts from a constructive action semantics of Simple, written in OBJ. See 
[13] for further details of the approach, [6] for a constructive action semantics of 
Core ML, and for alternative tool support for constructive action semantics 
based on the ASF+SDF Meta-Environment [7]. 
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4.1 Mapping Concrete Languages to Abstract Constructs 


The concrete constructs of Simple can be found in some form or other in most 
high-level general-purpose languages. To avoid bias toward particular families 
of programming language, we eschew the use of mixfix notation and concrete 
symbols (such as reserved words or mathematical signs) when declaring abstract 
constructs: the operation symbols are generally abbreviated wordd] and ordinary 
prefix notation is used when writing applications. Here are some examples: 


op assign : Var Exp -> Cmd . 
op cond : Exp Cmd Cmd -> Cmd . 


The abstract constructs are classified as variables (Var), expressions (Exp), com- 
mands (Cmd), etc., according to what kind of values they might compute. 

A mapping from concrete Simple programs to abstract (language-independ- 
ent) constructs is specified inductively in OBJ as illustrated in the excerpt below: 


op [[ _ ]] : Exp.LANG/SIMPLE/EXP -> Exp.CONS/EXP/SYN . 
op [[ _ ]] : Tst.LANG/SIMPLE/TST -> Exp.CONS/EXP/SYN . 
op [[ _ ]] : BPgm.LANG/SIMPLE/BPGM -> Cmd.CONS/CMD/SYN . 
op [[ _ ]] : Pgm.LANG/SIMPLE/PGM -> Cmd.CONS/CMD/SYN . 


xxx variables X : 
eq [[X]] =X. 


*** expressions E: 
eq [[I]] n 
eq [[E1 + E2]] 


app(plus, [[E1]], [[E2]]) 


os tests T: 
eq [[B]] =B. 
eq [[E1 < E2]] 


app(it, [[E1]], [[E2]]) 


we basic programs: 
eq [[X := E]] = assign(X, [[E]]) 


*** programs P: 

eq [[skip]] = skip . 

eq [[P1 ; P2]] = seq([[P1]], [[P2]] ) 

eq [[if T then P1 else P2 fi]] = cond([[T]], [[P1]], [[P2]]) 
eq [[while T do P od]] = cond-loop([[T]], [[P]]) 


Notice that if Simple were to be extended with an if-then structured program, it 
could be mapped to the obvious combination of previously introduced abstract 
constructs, thus avoiding the introduction of a further abstract construct: 


eq [lif T then P fi]] = conda([[T]], [[P]], skip) 


? Unabbreviated words can be too long for use in lectures and exercise classes. 
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Readers who are already familiar with the notation and intended interpre- 
tation of the abstract constructs concerned may find that the specification of 
the mapping from concrete to abstract constructs is sufficient explanation of the 
semantics of the concrete constructs. Other readers should consult the action 
semantic descriptions of the abstract constructs involved in the mapping, e.g.: 


eq evaluate app(0, E1, E2) = 
(evaluate E1 and evaluate E2) then give 0 . 


4.2 Modular Structure of Constructive Action Semantics 


The declaration of the action semantic function for each sort of abstract construct 
is a separate module, e.g.: 


obj CONS/CMD/ACT is 
pr CONS/CMD/SYN AN . 

op execute _ : Cmd -> Action . 
endo 


Hierarchical module names such as CONS/CMD/ACT facilitate navigation among 
large collections of modules, and avoid accidental clashes between names. The 
imported module CONS/CMD/SYN merely declares the sort Cmd, and is therefore 
available for use in connection with alternative styles of constructive seman- 
tics. The module AN, in contrast, declares the full Action Notation, rather than 
just the sort Action. (Modularization of the OBJ specification of AN would be 
possible, but it is irrelevant to the main issues addressed here.) 

The action semantic description of each individual abstract construct is also 
a separate module, e.g.: 


obj CONS/CMD/ASSIGN/ACT is 

pr CONS/CMD/ASSIGN/SYN . 

us CONS/CMD/ACT . 

pr CONS/VAR/ACT CONS/EXP/ACT . 

var V : Var. var E : Exp . 

eq execute assign(V, E) = 

(locate V and evaluate E) then update . 

endo 


It needs to import (i.e., depends on) only the modules that declare the action 
semantic functions for the sorts mentioned in the signature of the constructor, 
as well as any data types directly involved in the semantics of the abstract con- 
struct. It never imports other modules concerned with individual abstract con- 
structs. This discipline ensures a very flat modular structure, with no bundling 
of abstract constructs together. Notice that the declaration of each variable used 
in the semantic equation may have to be repeated in many different modules; 
giving these declarations in the modules that introduce the sorts of abstract 
constructs would allow their “importation” using the vars-of feature of OBJ, 
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and help to maintain uniformity of names for such variables, but on balance it 
seems preferable to exhibit the sorts of variables locally. 

Thanks to the systematic naming of modules, most of the modules that need 
to be imported for the action semantics of an individual construct are determined 
by the signature of the construct itself. It might be advantageous to generate the 
OBJ modules from more concise specifications where using a sort automatically 
imports the module that declares it (e.g., in OBJ files in the current directory, 
or in a specified search path), and similarly for semantic functions. Such “auto- 
loading” is familiar from Lisp, and has already been employed to considerable 
advantage in ASDF, the Action Semantic Description Formalism developed for 
use in connection with the Action Environment [5J6J15). 

Finally, the specification of the mapping from a particular concrete language 
imports the concrete syntax of the language and all the modules declaring the 
abstract constructs used in the target of the mapping. The complete action se- 
mantics of the concrete language imports moreover the modules that specify the 
action semantics of the abstract constructs. The action semantics of a concrete 
program is obtained by mapping it to an abstract program and applying the 
appropriate semantic function, and the resulting action can then be performed. 


See http: //www.cs.swan.ac.uk/~cspdm/Goguen-F'S/ for the full details. 


5 Conclusion 


We have shown how constructive action semantics can be specified in OBJ, 
and given excerpts from such a description of Goguen and Malcolm’s Simple 
illustrative language. Compared to the algebraic semantics of the same language 
given by Goguen and Malcolm, it would appear considerably easier to reuse entire 
modules of our specification when describing extensions or different languages 
(although full-scale case studies supporting this claim have yet to be carried 
out). However, we had to introduce a separate module for each construct, and 
all the explicit imports are somewhat tedious (both to write and to read). 
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Abstract. We recall the contribution of Goguen and Burstall’s 1980 
CAT paper and its powerful influence on theories of specification imple- 
mentation that were emerging at about the same time, via the intro- 
duction of the notions of vertical and horizontal composition of imple- 
mentations. We then give a different view of implementation which we 
believe provides a more adequate reflection of the rather subtle interplay 
between implementation, specification structure and program structure. 


1 Introduction 


Goguen and Burstall’s CAT paper is surely the most influential paper in 
the algebraic specification literature never to be properly published in a work- 
shop or conference proceedings or in a journal. The topic of the paper was the 
notion of specification implementation—also known as refinement—as a relation 
on specifications, used for the step-by-step development of a program from a 
specification of requirements. We write SP ~~ SP’ to denote that SP’ is an 
implementation of SP, with the informal meaning that SP’ captures all the re- 
quirements expressed by SP but in a way that incorporates more design decisions 
and is thus closer to being a program. A hot question at the time was how to 
properly formalise this intuition. Earlier work that was relevant to this question 
was Hoare’s work on data refinement which had been incorporated into 
VDM [Jon80], and Milner’s work on simulations [Mil71]; first approaches in the 
algebraic specification literature were [GTW78] and (early versions of) 
and [EKMP8J). 

The main contribution of was to sketch a compelling two-dimensional 
view of implementations, with implementations composing both vertically and 
horizontally. Composition along the vertical dimension corresponds to compo- 
sition of consecutive implementations: if SP w> SP’ and also SP’ w SP”, 
then one would expect to have SP ~~ SP”. This justifies the correctness of 
the principle of stepwise refinement. (This was called vertical composition be- 
cause Goguen and Burstall drew their implementations vertically, with SP at 
the top; we draw them horizontally here, except in a few diagrams, to save 
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space.) Horizontal composition is about composing implementations of parts of 
a specification to give an implementation of the whole: if SP; ~~ SP‘ and 
SPa ~~» SP}, then one would expect to have SP; 6 SP2 w SP‘, ® SP}, for 
any specification-building operation ®. In particular, this should hold for com- 
position of parameterised specifications: if P) w~» P and P) ~~ P} then one 
would expect to have P,;Pz, ~~ P/;P3. Finally, it was suggested that vertical 
and horizontal composition should satisfy the double law, which says that given a 
diagram of implementations admitting both vertical and horizontal composition 
of implementations, the result is the same whether vertical composition is done 
before or after horizontal composition. 

In Section [2] we recall this work. A vertical composition theorem was the 
main result in many accounts of implementations that were emerging at about 
the same time, sometimes under more or less restrictive conditions on the spec- 
ifications or implementations in question. Horizontal composition proved more 
elusive; in most cases it remained a topic for the “Future Research” section. 
Recent approaches go further. For instance, (cf. [Gog96]) provides some 
algebraic laws that link vertical and horizontal structure, but with what seems 
to be a somewhat different understanding of the vertical dimension. Another 
example is where horizontal composability is achieved for colimits of 
specification diagrams in the context of specifications for reactive systems. Still, 
to our knowledge, no theory of implementations ever entirely fulfilled the dream 
of CAT. 

In Section [4] we give a different view of implementations which we believe 
properly reflects the subtle interplay between implementations, specification 
structure and program structure, and observe that it trivially satisfies a ver- 
tical composition theorem. In Section [5] we consider horizontal composition, and 
conclude that it does not hold in general but neither is it desirable. The problem 
with horizontal composition arises from the lack of correspondence in general 
between the structure of a specification and the structure of a program that im- 
plements it, and the difference between operations for combining specifications 
on one hand and operations for combining program components on the other. 


2 CAT 


The CAT paper outlines a vision for a future interactive programming 
system to be used for the development and maintenance of programs from spec- 
ifications, in which program components were to be equipped with specifications 
of their properties. The processes by which implementations are carried out 
were to be fully modularised and parameterised, and all concepts in CAT were 
to have a full semantic definition in order to support formal proofs of correct- 
ness. Complete system designs were to be obtained by composing a number of 
implementations, each one expressing an elementary design decision. Such a de- 
gree of formalization and modularization would be useful not just for achieving 
correctness but also for restricting the scope of re-checking required when the 
system is modified subsequently. Scherlis and Scott’s Inferential Programming 
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paper [SS83], which led to the Ergo project at CMU [LPRS88], contains some 
more detailed ideas along similar lines. 

The important part of is only a few pages long, sandwiched between a 
quick review of the features of the then brand-new CLEAR specification language 
and a long OBJ definition that is only marginally relevant (see for 
a presentation of OBJ as it was at the time). The key insight is the recognition 
of a distinction between so-called vertical and horizontal structure: 


“One basic intuition behind CAT is that the process of implementing 
a large program from its specification has a two-dimensional structure. 
One dimension of structure, the horizontal, corresponds to the structure 
of the specification. The second dimension, the vertical, corresponds to 
the sequence of successive refinements of the specification into actual 
code; the specification is at the top, and the code is at the bottom.... A 
major purpose of the CAT project is to render this intuition much more 
precise.” J.A. Goguen and R.M. Burstall 


In elaborating this point, Goguen and Burstall make reference mainly to the 
structure of specifications arising from parameterised specifications, known as 
theory procedures in CLEAR, which provide a specification of requirements that 
any actual parameter needs to satisfy as well as a specification of the result. 
Implementation of one such procedure P by another one P’ having the same 
“metasource” and “metatarget” specifications SP and SP’ respectively (where 
any actual argument specification must extend SP and then the result will extend 
SP’ in a corresponding way) would be represented by the following diagram: 

P 


T 


SP a SP 


P' 
where a gives the relationship between P and P’. 

Nowadays the authors would presumably agree with us (see e.g. [Gog96]) 
that the proper entities here are specifications of parameterised programs, see 
[SST92], that is, descriptions of functions mapping algebras to algebras, rather 
than CLEAR theory procedures which map specifications (descriptions of classes 
of algebras) to specifications. See Section [B] 

Such implementations should compose both vertically and horizontally. Hori- 
zontal composition of implementations refers to composition of implementations 
of parts of a specification to give an implementation of the whole. Given the 
following diagram: 
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horizontal composition would give 


P;Q 


(pe 


SP a-b SP” 


where “-” denotes horizontal composition of implementations and “;” stands 
for composition of specifications of parameterised programs. The same idea ap- 
plies to other specification-building operations: given a : SP, ~~ SP and 
a’: SPa ~~» SP3, one would expect to have a © a’: SP; @ SPa ~~» SP ® SP3 
for any specification-building operation ©. This depends on having an opera- 
tion © for combining implementations that corresponds to each operation @ for 
combining specifications. But according to [GB80]: 


“Questions remain about how the CLEAR operations can be extended 
from specifications to implementations.” 


Vertical composition of implementations corresponds to stepwise refinement: 





P P 
(of p ( s  N 
SP SP’ SP aja’ SP’ 
Ne e ~“ y 
p" p" 


The composed implementation a;a’ combines the design decisions in œ with 
those in a’: for instance, if a shows how to implement graphs using sets, and 
a’ shows how to implement sets using lists, then a;a’ shows how to implement 
graphs using lists. 

Now, suppose we have a structured specification with consecutive implemen- 
tations of its components, like so: 








P Q 
i as ry [ B$ / Y 
SP = SP’ a SP" 
P" Q” 


In this situation we may apply vertical composition to give implementations 
a;a’ and 8;8', and then apply horizontal composition to give an implementation 
(a;a’)-(8;8’) : P;Q ~~ P”;Q”. Alternatively, we may first apply horizontal 
composition to give implementations a-@ and œ’-8', and then apply vertical 
composition to give an implementation (a-();(a’-G’) : P;Q ~~ P”;Q”. Goguen 
and Burstall conjecture that these two implementations should be the same: 
the order of composition should not matter. If this “double law” holds then 
implementations form a two-dimensional category, see (where the double 
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law is called the “interchange law”). They speculate that the double law may 
not hold for some specification-building operations, and then extra care must be 
taken at such points during the implementation process. 

All of this discussion is set in the context of an arbitrary institution = 
a concept which first appeared in the semantics of CLEAR [BG80]—abstracting 
away from the particular logical system used to write specifications. There is 
no formal definition of what implementation of specifications means. Goguen 
and Burstall also suggest that the CAT framework would be appropriate for 
use with various different programming languages and programming paradigms. 
Although functional languages are the most obvious fit, they speculate that 
the use of imperative languages and assembly languages should not pose any 
insurmountable obstacles. 


3 Specifications and Programs 


The precise syntax of specifications is not very important in this paper. More 
significant is the way that the semantics of specifications is defined: for each spec- 
ification SP, we define its signature Sig(SP) and its class of models, Mod (SP), 
where each SP-model is a Sig(SP)-algebra: Mod(SP) C Alg(Sig(SP)). The sig- 
nature of a specification defines an interface giving names to the required pro- 
gram components, while its models represent programs that are considered to be 
its correct realizations. If Sig(SP) = X we will say that SP is a 3'-specification. 

The framework we are describing is independent of any particular institution 
[GB92]. It can therefore be used with different programming paradigms by se- 
lecting a notion of model that reflects the features of the paradigm in question. 
However, for the sake of concreteness and simplicity let us concentrate on stan- 
dard many-sorted algebras over standard algebraic signatures, specified using 
axioms in first-order logic with equality. These capture a subset of Standard ML 
programs (so-called structures) over Standard ML signatures [MTHM97], com- 
prising first-order non-polymorphic datatypes and first-order non-polymorphic 
properly-terminating functions. 


Example 3.1. The following signature defines an interface for a program to sort 
lists of elements with respect to an order relation on the type of elements: 


signature SORTELEM = 

sig 
type elem 
val ord : elem * elem -> bool 
type listelem 
val nil : listelem 
val cons : elem * listelem -> listelem 
val sort : listelem -> listelem 

end 


A structure over this signature provides code for the required components, in- 
cluding such a sorting program: 
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structure SortElem : SORTELEM = 
struct 
type elem = int 
fun ord(x,y) = x >= y 
datatype listelem = nil | cons of elem * listelem 
fun sort 1 = ... (* code for sort *) 
end 


The semantics of Standard ML |M'THM97| can be used to interpret the above 
code as a definition of an algebraic signature, call it [SORTELEM], and a particular 
algebra over this signature [SortElem] € Alg(SORTELEM]). 


Example 3.2. The following specification has the above program as a correct 
realization: 


specification SORTELEMSPEC = 
spec 
type elem 
val ord : elem * elem -> bool 
axiom ... (* ord is transitive, reflexive and antisymmetric *) ... 


datatype listelem = nil | cons of elem * listelem 
val sort : listelem -> listelem 


axiom ... (* sort produces a permutation of its input *) 
axiom ... (* the output of sort is ordered according to ord *) ... 
end 


Then Sig(SORTELEMSPEC) = [SORTELEM] and [SortEleml] € Mod(SORTELEMSPEC) 
C Alg([SORTELEM]). 


For the sake of example, one often considers the following rudimentary ways 
of building specifications: 


basic specifications: For any signature X and set ® of /-sentences, the basic 
specification (X, ®) is a X-specification with Mod((X’,®)) = {M € Alg(Z) | 
M |= S}. (SORTELEMSPEC above is a basic specification.) 

union: For any X, given X-specifications SP; and SP2, their union SP1 U SP2 
is a X-specification with Mod(SP1 U SP2) = Mod(SP,)M Mod( SP2). 

translation: For any signature morphism o: X + X’ and X-specification SP, 
translate SP by o is a X”-specification with Mod(translate SP by o) = 
{M' € Alg(5") | M’|, € Mod(SP)}H] 

hiding: For any o: X — X’ and &’-specification SP’, derive from SP’ by o 
is a X-specification with Mod(derive from SP’ by o) = {Mjo | MW € 
Mod(SP’)} 





1 For any signature morphism o: X — X' and algebra M’ € Alg(&”), M'|e € Alg( X) 


is the reduct of M’ with respect to ø, see e.g. [ST99]. When ø is a signature inclusion, 
M'|o may be written as M'|». 
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This follows ASL and is different from CLEAR, where specification 
expressions denoted theories which in turn have model classes, see |S'T'97| for a 
discussion of the difference. The operations are more primitive but are similarly 
expressive: for instance “+” in CLEAR corresponds to union of suitably translated 
specifications over different signatures, where the translations respect shared 
subspecifications. 

This defines a number of so-called specification-building operations which map 
specifications to more complex specifications: we have constant specification- 
building operations (basic specifications), one binary specification-building op- 
eration (union) and two unary ones (translation and hiding). In fact, each of 
these may be viewed as a family of operations, indexed by signatures (union) and 
specification morphisms (translation and hiding). Once this “static” indexing is 
fixed, each specification-building operation semantically amounts to a function 
on appropriate classes of models. 

One property of the above specification-building operations will prove cru- 
cial for further considerations: an n-ary specification-building operation op is 
monotone if it is monotone as a function on model classes. That is: for any 
specifications SP1, SP, ..., SPn, SPL, such that Sig(SP;) = Sig(SP;) and 
Mod(SP;) C Mod(SP%) for i =1,...,n, we also have Mod(op(SP1,...,$Pn)) Z 
Mod(op(SP},...,SP,,)). 

All the above specification-building operations, and therefore any operation 
that may be defined using them, are monotone. In fact, nearly all specification- 
building operations one may find in the literature are monotone. The only 
exception we are aware of are operations that select initial or free models of 
specifications—one may argue though that such an operation should be viewed 
as simply imposing an additional constraint on the class of models of a speci- 
fication, like an axiom, rather than as specification-building operations in their 
own right (see for instance data constraints in [GB92)). 

Structured specifications in CASL are based on the operations 
above as well; somewhat more convenient notation is introduced there, which we 
will use in examples too. For instance, union (not limited to specifications with 
identical signatures) is written with and, translation along surjective signature 
morphisms is written with with (followed by the mapping of symbols), hiding 
is written with reveal or hide (followed by a list of symbols). Perhaps most 
useful is then, which is an obvious combination of a translation along a signa- 
ture inclusion with union to build an extension of a specification by new sorts, 
operations and/or axioms. 


Example 3.3. Here are some examples of structured specifications: 


specification ELEMSPEC = 
spec 
type elem 
val ord : elem * elem -> bool 
axiom ... (* ord is transitive, reflexive and antisymmetric *) ... 
end 
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specification ELEMLISTSPEC = 
ELEMSPEC then 
datatype listelem = nil | cons of elem * listelem 
end 
specification PERMELEMSPEC = 
ELEMLISTSPEC then 
val perm : listelem -> listelem 
axiom ... (* perm produces a permutation of its input *) 
end 
specification ORDERELEMSPEC = 
ELEMLISTSPEC then 
val order : listelem -> listelem 
axiom ... (* the output of order is ordered w.r.t. ord *) 
end 
specification STRUCTSORTELEMSPEC = 
{PERMELEMSPEC with perm |-> sort} 
and 
{ORDERELEMSPEC with order |-> sort} 


Specifications SORTELEMSPEC of Example [3.2] and STRUCTSORTELEMSPEC above 
are equivalent: they have the same signature (SORTELEM] in both cases, see 
Example[3.1) and the same class of models. 


In common with all work on algebraic specification we have taken the view 
that algebras model programs. But in general we are interested in program 
components which define new sorts and operations in terms of some existing ones. 
These may be generic components, where the parameters are supplied explicitly, 
or components that explicitly import or implicitly build on other components. 
In each case, we need to model components as functions mapping algebras to 
algebras; in the case of explicit or implicit imports this reflects the way that the 
newly-defined sorts and operations depend on the imports. 


Definition 3.4. Let X and X' be signatures. A (X — X')-constructor is a 
function? mapping 5/-algebras to X'-algebras. 


In the standard algebraic institution, constructors correspond most directly 
to Standard ML functors defining first-order non-polymorphic datatypes and 
first-order non-polymorphic properly-terminating functions, where the input and 
output signatures are explicit. 


Example 3.5. Here is an example of a constructor in Standard ML: 


Signature ELEM = 
sig 
type elem 


2 In general, we need to consider partial constructors, where the result may not be 
defined for every algebra over the parameter signature but only for those that sat- 
isfy additional constraints. See |ST89}. For simplicity, we restrict attention to total 
constructors here, with a few comments in footnotes concerning partial constructors. 


304 Donald Sannella and Andrzej Tarlecki 


val ord : elem * elem -> bool 
end 


functor Sort(X: ELEM) : SORTELEM = 


struct 
open X 
datatype listelem = nil | cons of elem * listelem 
fun sort 1 = ... (* code for sort *) 

end 


The semantics of Standard ML can be used to interpret the above code as 
defining a function mapping [ELEM]-algebras to [SORTELEM]-algebras, i.e. an 
({ELEM] — [SORTELEM])-constructor. One important property of this function 
is that it is persistent: the argument structure is extended to the result struc- 
ture, preserving the interpretation of parameter types and values. 


Any (X > &”)-constructor « determines a specification-building operation, 
written « as well, that takes any X-specification SP to a /”-specification having 
the image of Mod(SP) under « as its models: Mod(K(SP)) = {K(M) |M € 
Mod(SP)}. Hiding is one such specification-building operation, determined by 
reduct. The other specification-building operations discussed above do not arise 
in such a way, in general. Translation is determined by a total constructor only 
when it is with respect to a bijective renaming} and then it coincides with hiding 
with respect to the inverse of that renaming. CASL union is not determined by 
a total constructor unless there is no overlap (“sharing”) between the signatures 
of the arguments 

Constructors may themselves be specified. For the same reason as ordinary 
specifications describe classes of algebras, constructor specifications describe 
classes of constructors, that is, classes of functions mapping algebras to alge- 


bras |SST92). 


Definition 3.6. Given specifications SP and SP’, the constructor specifica- 
tion SP — SP’ specifies the class of (Sig(SP) — Sig(SP’))-constructors that 
map models of SP to models of SP’: Mod(SP — SP’) = {F: Alg(Sig(SP)) — 
Alg(Sig(SP’)) | for each A € Mod(SP), F(A) € P 

Moreover, when Sig(SP) overlaps with Sig(SP') then the specified construc- 
tors should preserve the interpretation of the overlapping sorts and operations. In 
particular, when Sig( SP) is a subsignature of Sig(SP’), then as in CASL we re- 
quire the functions in Mod(SP — SP’) to be persistent: when F: Alg(Sig(SP)) > 
Alg(Sig(SP’)) € Mod(SP — SP’) then for every model A € Mod(SP), F(A) € 
Mod(SP’) is such that F(A)| sig(sP) =A. 


3 Translations along surjective signature morphisms are determined by partial con- 
structors, in general. 

4 When there is overlap, CASL union is determined by a partial constructor which 
amalgamates models that coincide on the shared subsignature. 

5 If partial constructors are considered, an additional requirement here would be that 
their domain contains Mod(SP). 
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Example 3.7. Recall Examples B.1}3-3] Then ELEMSPEC — SORTELEMSPEC is a 
specification of (persistent) constructors F: Alg(ELEM]) — Alg([SORTELEM]) 
that when given a model E € Mod(ELEMSPEC) extends it to a model F(E) € 
Mod(SORTELEMSPEC). One example of such a constructor is the functor Sort € 
Mod(ELEMSPEC — SORTELEMSPEC), presented in Example[3.5} Constructor spec- 
ifications correspond to functor specifications in Extended ML, see [KST97]. 


The generalisation to n-ary constructors and constructor specifications is 
straightforward. 


4 Implementations and Vertical Composition 


A very simple notion of specification implementation is the following: 


Definition 4.1. Let SP and SP’ be specifications such that Sig( SP) = Sig(SP’). 
Then SP’ is a simple implementation of SP, written SP w> SP’, if Mod(SP) > 
Mod(SP’). 


This simply requires that all of the correct realizations of SP’ are correct real- 
izations of SP. That is, SP’ incorporates all the requirements that are in SP, 
and perhaps other constraints that result from additional design decisions. 

For simplicity, the definition of simple implementation requires the signatures 
of both specifications to be the same. The hiding operation may be used to adjust 
the signatures (for example, by removing auxiliary functions from the signature 
of the implementing specification) if this is not the case. 

The fact that simple implementations vertically compose is an immediate 
consequence of the transitivity of the subset relation: 


Proposition 4.2. If SP ~» SP’ and SP’ ~w SP" then SP wwo SP". 


The notion of simple implementation is powerful enough (in the context 
of a sufficiently rich specification language) to handle all concrete examples of 
interest. However, it is not very convenient. During the process of developing a 
program, the successive specifications incorporate more and more details arising 
from successive design decisions. Thereby, some parts become fully determined, 
and remain unchanged as a part of the specification until the final program 
is obtained. The following diagram is a visual representation of this situation, 
where K1,...,Kn label the parts that become determined at consecutive steps. 


It is more convenient to avoid such clutter by separating the finished parts from 
the specification, putting them aside, and proceeding with the development of 
the unresolved parts only: 
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where EMPTY is a specification for which a standard implementation empty is 
available. 

It is important for the finished parts «1, ..., Kn to be independent of the 
particular choice of realization for what is left: they should extend any realization 
of the unresolved part to a realization of what is being implemented. This is 
exactly what is required by the notion of a constructor defined in Sect. |[3} «ki 
is a function taking models of SP; to models of SP;_;. These considerations 
motivate a more elaborate version of the notion of implementation: 


Definition 4.3 ({ST88b]). Given specifications SP and SP’ and constructor 
k : Alg(Sig(SP’)) — Alg(Sig(SP)), we say that SP’ is a constructor implemen- 
tation of SP via x, written SP w> SP’, if k € Mod(SP' > SP). 


Thus, in the development diagram above, K;: Alg(Sig(SP;)) —> Alg(Sig(SPi_1)) 
with k; € Mod(SP; — SP;_1) for 1 < i < n; that is, each k; corresponds to 
a parameterised program with input interface SP; and output interface SP;_}. 
Given a model M of SP;, ki may be applied to yield a model «;(M) of SP;_1. 


Example 4.4. From Example|3.7 we have SORTELEMSPEC www ELEMSPEC. That 
is, the task of implementing sorting of lists of elements with respect to a function 
ord is reduced by means of the constructor Sort to the task of implementing 
elem and ord. 


The definition of constructor implementation generalises smoothly to imple- 
mentations of constructor specifications. This requires higher-order constructors; 
for details see : 

It is easy to see that constructor implementations compose vertically: 


Proposition 4.5. If SP ~ SP’ and SP’ w SP” then SP i Aol SP”. 


So, a constructor implementation via «x: Alg(Sig(SP’)) — Alg(Sig(SP)) com- 
posed with a constructor implementation via K’: Alg(Sig(SP”)) — Alg(Sig(SP’)) 
yields a constructor implementation via «';x: Alg(Sig(SP”)) — Alg(Sig(SP)), 
which is just the composition of the functions «’ and « written in diagrammati- 
cal order. 

Once the development process is finally complete (that is, when nothing is 
left unresolved, as in the diagram above) we can successively apply the construc- 
tors to obtain a correct realization of the original specification. The correctness 
of the final outcome follows from the correctness of the individual constructor 
implementation steps via vertical composition. 


Proposition 4.6. Given a chain of constructor implementation steps 
SPo wy SP1 ype + yew SP», = EMPTY 


we have (Knj+ ++ 3K23K1)(empty) E€ Mod(SPo). 
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Many approaches to implementation in the literature make use of a restrictive 
kind of constructor defined by a parameterised program having a particular 
rigid form: for example, the notion of implementation in corresponds 
to the use of a constructor obtained by composing a free construction with a 
reduct, then a restriction to a subalgebra, and finally a quotient, in that order. 
Then the vertical composition of two implementations is required to yield an 
implementation of the same form, which is only possible under certain additional 
conditions on the specifications involved. This amounts to a requirement that the 
composition of parameterised programs be forced into some given normal form, 
which corresponds to requiring programs to be written in a rather restricted 
programming language. 


5 Horizontal Composition 


In Sect. [3] we have recalled a few basic specification-building operations, which 
form the backbone of many specification languages. Since the pioneering work 
on CLEAR [BG80], a number of such languages have been designed and used, 
with CASL as a prime recent example. They all aim at pro- 
viding a convenient way to build specifications in a structured manner, where 
specification-building operations are used to gradually construct more and more 
complex specifications out of simpler component specifications. This horizon- 
tal structure of specifications (in the terminology of [GB80]) is indispensable 
for facilitating the understanding and use of any practical (hence: large and 
complex) specification. Typical ways in which the horizontal structure of speci- 
fications has been successfully exploited include the compositional semantics of 
complex specifications languages like CASL and compositional proof 
systems for consequences of specifications, as introduced in and ana- 
lyzed in [Bor02], even if for practical specification languages compositionality 
may sometimes be sacrified [MHAH04). 

Under a mild assumption of monotonicity of the specification-building oper- 
ations involved, the horizontal structure of specifications may also be exploited 
in the development process: 


Proposition 5.1. Suppose that op is a monotone n-ary specification-building 
operation. If SP; w SP}, ..., SPn ~~ SP’, then op(SPi,...,SPn) ~~ 
op(SP},..., SPL). 


For simple implementations, Prop. [B.I]captures the essence of horizontal com- 
position, as introduced in |GB80}. For constructor implementations this takes the 
following form: 


Proposition 5.2. Suppose that op is a monotone n-ary specification-building 
operation. If SP, y~ CP omg OP a we SP, then op(SPi,...,5Pn) ~~ 
op(ka (SP1), ---, Knl oP, 


Note that «kı in kı (SP1) refers to the specification-building operation determined 
by the constructor kı—see Sect. [B}and similarly for the other constructors. 
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The strength and usefulness of Props. [5.I]and[5.2]are severely limited by two 
fundamental problems. 

First, the consistency of specifications is not preserved under such refinement 
in general. In Prop B.I] op(SPi,...,5P,) may be a perfectly implementable 
(consistent) specification, while op(SP{,..., SP/,) is inconsistent, and hence can- 
not be implemented, even if implementation of each of the refined individual 
component specifications SP, ..., SP’, is unproblematic. 


Example 5.3. Consider the following trivial example: 


specification EVEN = 
spec val a: int 
axiom exists k : int .a=2*k 
end 
specification SMALL = 
spec val a : int 
axiom a > 0 andalso a < 10 
end 
specification SMALL_EVEN = SMALL and EVEN 


The last specification is formed as a union of two simpler specifications, and thus 
combines the requirements they impose. (Obviously, algebras in [SMALL_EVEN] 
have a € {2,4,6, 8}.) 

Since and is monotone, Prop. [5.I]allows one to refine SMALL_EVEN by refining 
its component specifications independently. Consider for instance: 


specification VERY_EVEN = 
spec val a : int 
axiom exists k : int .a=8 * k 
end 
specification VERY_SMALL = 
spec val a: int 
axiom a > 0 andalso a < 5 
end 
specification VERY_SMALL_VERY_EVEN = VERY_SMALL and VERY_EVEN 


Clearly, we have then EVEN ~~ VERY_EVEN and SMALL ~~ VERY_SMALL, and so 
by Prop. BI] 
SMALL_EVEN w~ VERY_SMALL_VERY_EVEN. 


However, even though both VERY_SMALL and VERY_EVEN are consistent and sep- 
arately can be easily implemented, the specification VERY_SMALL_VERY_EVEN is 
inconsistent, and so taking this implementation step cannot lead to a final real- 
ization of SMALL_EVEN. 


The above problem with consistency of the refined specification may arise 
even with a unary specification-building operation op (for instance, consider 
translation along a non-injective signature morphism). However, it does not arise 
if the operation op is determined by a constructor. 
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The other problem with refinement based on horizontal composability is per- 
haps even more fundamental. Although the horizontal structure of a specification 
is crucial for its understanding and use, in general this structure may well be 
quite different from the modular structure of the final program that implements 
it. The aims of horizontal structure at the level of the original, high-level, ab- 
stract requirements specification are quite separate from the aims of modular 
structure in the final program. An interesting and convincing example is pre- 
sented in in a somewhat different framework, but the case study and 
the general line of reasoning carry over here as well. The conclusion from this 
is that while horizontal composability (with respect to monotone specification- 
building operations) yields sound refinements and so may be used when appro- 
priate, it cannot be the only way to implement structured specifications. We 
need separate means to explicitly mark design decisions that fix the final mod- 
ular structure of the program under development, which requires the top-level 
specification-building operations to be determined by constructors. Once such a 
design specification has been fixed, this top-level horizontal structure is 
to be preserved in programs resulting from the development process, and further 
development proceeds for each component specification separately. The final re- 
sult is then obtained by applying the top-level constructors to the outcomes of 
these separate developments. 

Consider for instance an n-ary constructor op. Abusing slightly the notation 
of architectural specifications as provided by CASL [CoF 04], a 
design specification that designates the top-level constructor op to be preserved 
and used at the top level of the modular structure of the final program may take 
the following form: 


arch spec OP_DESIGN = 
units U_1 : SP_i 


U_n : SP_n 


result op(U_1,...,U_n) 
This introduces names (U_1, ..., Um) of units (or modules) to be further de- 
veloped as realizations of their specifications (SP_1, ..., SP_n, respectively) and 


then put together using the constructor op to yield the overall realization of the 
system) An architectural specification can be compared with ordinary specifica- 
tions by defining its models to be all the possible result units that may be built 
in this way. Then one may consider refinements involving architectural specifi- 
cations, like SP ~~ OP_DESIGN. This captures a design decision to implement 
the specification SP by a modular system, where the top-level modules U_1, ..., 
U_n, fulfilling specifications SP_1, ..., SP_n, respectively, are put together using 
the constructor op. 

In particular, we always have: op(SP_1,...,SP_n) ~> OP_DESIGN. Note that 
op refers here to the specification-building operation determined by the con- 
structor op, see Sect. [3] 


6 If op is partial, it is necessary to ensure that no tuple of models which may potentially 
be given as an argument to op is outside its domain. See [BST02]. 
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For unary constructors K, the constructor implementation SP wye SP’ cor- 
responds exactly to the refinement SP ~~ K_DESIGN, where 


arch spec K_DESIGN = unit U : SP’ result K(U) 


An important twist in CASL architectural specifications is that the units 
used here may in fact be generic modules, that is, constructors with specifica- 
tions taking the form discussed in Sect. [B] This allows one to delegate “coding” 
of constructors (as, say, Standard ML functors) to further development of the 
corresponding units, and to limit the vocabulary of the constructors in use in 
the result unit expression to a few basic constructs including the application of 
a generic unit to an argument. 


Example 5.4. Recall the specifications in Examples [3.IH3.7] Note that the spec- 
ification SORTELEMSPEC requires a sorting program sort for some realization for 
the type elem and ordering predicate ord chosen by the implementor. The fol- 
lowing architectural specification decomposes this task by separating out on one 
hand the task to build such a realization for elem and ord, and on the other 
hand, the task of providing a sorting program sort that will work for any such 
realization. The overall result is then given by instantiating the outcome of the 
latter task to the outcome of the former one. 


arch spec SORT_SPEC = 
units E : ELEMSPEC 
S : ELEMSPEC -> SORTELEMSPEC 
result S(E) 


Then SORTELEMSPEC ~~ SORT_SPEC. We also have STRUCTSORTELEMSPEC w~ 
SORT_SPEC even though the structure of SORT_SPEC does not match the structure 
of STRUCTSORTELEMSPEC. 


The main point of architectural specifications as sketched above is that fur- 
ther developments of the specified units may proceed independently from each 
other, and the final results of these developments, which fulfill the unit specifi- 
cations, may then be put together as prescribed by the result unit expression. 
Soundness of this procedure is guaranteed by the horizontal composability of 
implementations, Props. [5-1] and [5.2} however, with the additional effect that 
consistency of the result is ensured provided that each refined component spec- 
ification remains consistent. 

Note that horizontal composability follows from the following properties of 
implementation steps involving individual component specifications. Let op be 
a monotone n-ary specification-building operation. 


— If SP, ~~ SP} then op(SPi,...,5Pn) ~~ op(SP},...,5Pn). 
— If SP, ~~» SPi, then op(SP1,...,9Pn) ~~ op(SP1,..., SPL). 


Prop. B.I] then follows by a simple application of vertical composability 


(Prop. [4.2). 
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Similarly, for constructor implementations we have: 


— If SP1 ye SP‘ then op(SP1,...,5Pn) ~~ op(Ki(SP}),...,SPn). 


— If SPn yew SP! then op(SP1,...,SPn) ~~ op(SP1,...,#n(SP",)). 


Prop. [5.2] now follows easily by Prop. [4.2] 

The refinements of component specifications here are entirely independent 
from each other, and so may be taken in an arbitrary order. “Composition” of 
such independent refinements in any chosen order always yields the same result. 

The key case here is when op is a constructor, and the specification considered 
is the architectural specification OP_DESIGN as above. In the notation of [MST04], 
refinements of individual unit specifications can be defined as follows: 


refinement R_1 = U_1: SP_1 refined to arch spec 
unit X_1 : SP’_1 
result K_1(X_1) 


refinement R_n = U_n: SP_n refined to arch spec 
unit X_n : SP’_n 
result K_n(X_n) 


In |MST04|, we have introduced the possibility of composing refinements, and 
indeed, according to the formal semantics given there, the above refinements can 


be composed in an arbitrary order, and each such composition yields the same 
result. For instance: 


refinement R_1_to_n 
refinement R_n_to_1i 


R_1 then ... then R_n 
R_n then ... then R_1 


yields R_1_to.n = R_n_to-_1. The fact that these refinements coincide in the case 
n = 2 captures the “double law” of [GB80], see Sect. 

In fact, provides for the possibility of writing down the correspond- 
ing fragment of a development tree as follows: 


arch spec DEVELOP = 
units U_1 : SP_1 refined to arch spec 
unit X_1 : SP’_1i 
result K_1(X_1) 


U_n : SP_n refined to arch spec 
unit X_n : SP’_n 
result K_n(X_n) 
result op(U_1,...,U_n) 


It should be clear (and this can be formally proved within the framework of 
MST04]) that this is equivalent to the following architectural specification: 


arch spec OP_DESIGN’ = 
units X_i : SP’_1i 
X_n : SP’_n 
result op(K_1(X_1),...,K_n(X_n)) 
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This explicitly captures the composition of the design decision to use op as the 
top-level constructor (captured by OP_DESIGN) with the constructor implemen- 
tations for components in an arbitrary order. Note that this easily generalises to 
implementations of individual components that lead to further decomposition, 
again given by architectural specifications. 


Example 5.5. Continuing Examples [5.4] and B.1}3-7] consider the following ad- 
ditional specification: 


specification INSERTELEMLISTSPEC = 
ELEMLISTSPEC then 
val insert : elem * listelem -> listelem 
axiom ... (* if 1 is ordered then insert(e,1) puts e into 1 
so that the result is ordered *) 


Then the architectural specification SORT_SPEC may be refined as follows: 


arch spec SORT_SPEC’ = 
units E: ELEMSPEC 
S: ELEMSPEC -> SORTELEMSPEC 
refined to 
arch spec 
units L: ELEMSPEC -> ELEMLISTSPEC 
I: ELEMLISTSPEC -> INSERTELEMLISTSPEC 
IS: INSERTELEMLISTSPEC -> SORTELEMSPEC 
result lambda X: ELEMSPEC . IS(I(L(X))) 
result S(E) 


We can also make the resulting overall design explicit as follows: 


arch spec SORT_SPEC’’ = 
units E: ELEMSPEC 
L: ELEMSPEC -> ELEMLISTSPEC 
I: ELEMLISTSPEC -> INSERTELEMLISTSPEC 
IS: INSERTELEMLISTSPEC -> SORTELEMSPEC 
result IS(I(L(E))) 


Of course, we then have SORTELEMSPEC ~~ SORT_SPEC’’. Further development 
may involve for instance direct implementations of the generic units L, I and IS 
as Standard ML functors, entirely independent from each other. 


The above example is misleadingly simple since there is no requirement for 
sharing between the units involved in the design. In general this need not be the 
case. Suppose that the task of implementing a specification SP pig is decomposed 
into the tasks of implementing specifications SP; and SP2 where [SP and 
SP2] C [SPoig] but the signatures of SP; and SP2 overlap. If a realization of 
SPoig is to be obtained by combining realizations of SP; and SP2, these two 
realizations need to share the realization of their common part. This is handled 
as in [Bur84]: we provide a specification SP of the common part and add its 
realization as a new task, and then use (persistent) generic units to separately 
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extend the resulting unit to realizations of SP; and SP2, thus ensuring that they 
share this common part and so can be put together. 

Formalizing this: if Sig(SP) D Sig(SP1)M Sig(SP2) and [SP and SP, and 
SP2] C [SPoig], then SP rig ~> SHARING_SPEC where 


arch spec SHARING_SPEC = 


units U: SP 
F1: SP->SP1 
F2: SP->SP2 


result F1(U) and F2(U) 


Here, “and” is a partial binary constructor which amalgamates two models pro- 
vided that they coincide on their common subsignature—see footnote Aland note 
that the requirement mentioned in footnote[jis satisfied. Note again that further 
refinements of the components may proceed independently from each other. 


6 Conclusions 


What emerged from was a powerful and stimulating view of the process of 
systematic development of software from high-level formal specifications. What 
was insightful, new and perhaps ahead of its time then was the stress on structure 
as the only realistic means to master the size and complexity of practical software 
development projects. 

The CAT paper identified formally two orthogonal aspects of structure in 
the process of software development: the vertical dimension, the structure of the 
development process as such; and the horizontal dimension, the structure of the 
specifications involved in development. Making this distinction was crucial to 
separating the two dimensions, for separate study, with vertical and horizontal 
composability as the key result to aim for. These separate lines of research re- 
sulted in a lot of interesting work, crucial for an adequate formalisation of the 
development process. 

The vertical dimension proved easier for the theory: in spite of technical dif- 
ficulties, in many frameworks the key vertical compositionality result has been 
established, with our composition of constructor implementations (further gen- 
eralised to composition of abstractor implementations, not discussed here, see 
[ST97]) covering the previous work as special cases—with the results 
recalled in Sect. Ø] 

The horizontal dimension attracted much work and research as well (includ- 
ing the pioneering work by Goguen and Burstall themselves on CLEAR [BG80]) 
with many specification languages designed that included various forms of hor- 
izontal structuring of specifications, and many key results on the use of this 
horizontal structure for proper understanding and use of large specifications. 
However, the interaction of the horizontal structure with development, formu- 
lated in as horizontal composability, and the double law used to capture 
the interplay between the two dimensions, proved much tougher. In fact, there 
are hints in which indicate that the authors viewed this idea as some- 
what speculative, and foresaw potential obstacles in making it effective. We have 
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already quoted their thought that the task to design implementation composi- 
tion operations corresponding to all specification-building operations in CLEAR 
might be difficult. They also mention that the structure of a specification, with 
horizontal composition as the way to build its implementations, may constitute 
an “implementation bias”, thus (perhaps unnecessarily) preventing implementa- 
tions having a different structure. From our current perspective, it seems a bit 
unrealistic to claim that “this kind of bias seems to be actually desirable for large 
specifications, because it helps the implementer in his difficult task of structur- 
ing the overall program design.” Indeed, this may well be the case sometimes, 
but it is certainly not always true. 

As presented at length in Sect. B] we are very far from the view that horizon- 
tal composability is unimportant. However, we believe that one should carefully 
distinguish and keep separate two conceptually different roles that the horizontal 
structure of a specification may play. One is the usual structuring of specifica- 
tions, used to present the concepts of the problem space in a clear and perspicu- 
ous way. The horizontal structure obtained in this way is in principle irrelevant 
for vertical development, although it may be used when appropriate. The other 
role is the design of the modular structure of the system to be developed. This 
may be viewed as a very special kind of horizontal structure, which indeed is re- 
quired to be preserved throughout development. Horizontal composability with 
respect to this structure is crucial, of course, and the double law is a natural 
and useful consequence. We proposed architectural specifications as a tool for 
capturing horizontal structure of this latter kind. We feel that the overall pic- 
ture of vertical development and its interplay with this horizontal structure, as 
imposed by architectural specifications and sketched in Sects. [4] and B] give a 
well-founded account of the ideas that were put forward in [GB80]. 


Acknowledgements: Hearty congratulations to Joseph on his 65th birthday 
and our thanks to him for the many novel ideas that over the years have stimu- 
lated much of our own work as well! 
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Abstract. Goguen emphasized long ago that colimits are how to com- 
pose systems [7]. This paper corroborates and elaborates Goguen’s vision 
by presenting a variety of situations in which colimits can be mechan- 
ically applied to support software development by refinement. We il- 
lustrate the use of colimits to support automated datatype refinement, 
algorithm design, aspect weaving, and security policy enforcement. 


1 Introduction 


Goguen emphasized long ago that colimits are how one composes systems [7]. 
In particular, Burstall and Goguen focused on specifications as presentations 
of theories and the composition of specifications by colimit in the CLEAR and 
CAT system proposals [8[11]. In a sense this paper serves to corroborate and 
elaborate Goguen’s insight through its applicability to software development by 
refinement of specifications. 

Kestrel’s Specware system [29]12] is a descendant of CLEAR and CAT that 
uses the cocomplete category of specifications over higher-order logic. Specware 
is used to support the refinement of specifications into correct code in various 
target programming languages, including CommonLisp, C, and Java. The role 
of category theory is to organize the larger-scale structure of specifications and 
the refinement process. The objects of the category are specifications, diagrams 
represent structured specifications, and morphisms represent inclusions, param- 
eters, and refinements. Specware uses a colimit algorithm to compose specifica- 
tions and it uses pushouts to instantiate parameterized specifications (as in [8]). 
Most of the detailed design work in software development is logical in nature and 
is performed inside specifications (i.e. below the level of the category). No deep 
results of category theory are used, but the structuring provided by the categor- 
ical framework has been conceptually useful and has guided the implementation 
of Specware. The Specware system has been used for a variety of applications 
involving both high assurance (e.g. [4]) properties and high performance (e.g. 


m). 

The most basic and straightforward use of colimits in a category of specifica- 
tions is to build large specifications out of smaller specifications [2]. We briefly 
review the technicalities of this usage, but the main focus of the paper is on 
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how to use composition by colimit to construct refinements. In particular, we 
discuss (1) how to represent design abstractions as specifications and specifica- 
tion morphisms and how to apply a design abstraction by colimit, and (2) how 
to express some kinds of policy requirements by automata and how to enforce 
such policies by a suitable colimit. The concepts are illustrated by examples from 
automated datatype refinement, algorithm design, aspect weaving, and security 
policy enforcement. 


2 Preliminaries 


We briefly review the category of specifications over classical higher-order logics, 
since all the examples and discussion build on it. 

A specification (or spec) is the finite presentation of a theory. The signature 
of a specification provides the vocabulary for describing objects, operations, and 
properties in some domain of interest, and the axioms constrain the meaning 
of the symbols. For example, the following specification for partial orders is 
expressed in the MetaSlang specification language of Specware. It introduces a 
type symbol E and an infix binary predicate on F, called le, which is constrained 
by the usual axioms. 


spec Partial-Order is 
type E 
op le_: E, E — Boolean 
axiom reflexivity is x le x 
axiom transitivity is xley A ylez = > alez 
axiom antisymmetry is «ley A ylex => © = y 
end-spec 


A specification morphism translates the language of one specification into the 
language of another specification, preserving the property of provability, so that 
any theorem in the source specification remains a theorem under translation. In 
Specware, a specification morphism m : T — T” is given by a map from the type 
and operator symbols of the domain spec T to the symbols of the codomain spec 
T’. To be a specification morphism it is sufficient to show that every axiom of 
T translates to a theorem of T’. It then follows that a specification morphism 
translates theorems of the domain specification to theorems of the codomain. 

For example, a specification morphism from Partial-Order to Integer can be 
presented by: 


morphism Partial-Order-to-Integer : Partial-Order — Integer is 
{E | Integer, le œ <} 


where Integeris a specification for the integers that includes the usual constants 
(such as 0), comparison relations (such as lesser-or-equal <), functions (such as 
addition), and so on. 

Translation of an expression by a morphism is by straightforward application 
of the symbol map, so, for example, the Partial-Order axiom V(x: E) x le x 
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translates to V(x : Integer) x < «x With a reasonable axiomatization of the 
integers it is easy to verify that the three axioms of a partial order remain 
provable in Integer theory after translation. 

Specification morphisms compose in a straightforward way as the compo- 
sition of finite maps. It is easily checked that specifications and specification 
morphisms form a category SPEC. Colimits exist in SPEC and are easily com- 


puted. Suppose that we want to compute the colimit of B a A First, 
form the disjoint union of all sort and operator symbols of A, B, and C, then 
define an equivalence relation on those symbols: 


s x tiff (i(s)=t V i(t)=s V j(s) =t vV j(t)=s). 


The signature of the colimit (also known as pushout in this case) is the collection 
of equivalence classes wrt ~. The cocone morphisms take each symbol into its 
equivalence class. The axioms of the colimit are obtained by translating and 
collecting each axiom of A, B, and C. The colimit can be scalably computed in 
near-linear time. 

For example, suppose that we want to build up the theory of partial orders 
by composing simpler theories. 


spec PreOrder is 
spec BinRel is import BinRel 
type E axiom reflexivity is x le x 
op le_: E, E — Boolean axiom transitivity is 
end-spec tley A ylez => «lez 
end-spec 


spec Antisymmetry is 
import BinRel 
axiom antisymmetry is 
xley A ylex => t = y 
end-spec 





The pushout of Antisymmetry — BinRel — PreOrder is isomorphic to 
the specification for Partial-Order given above. In detail: the morphisms are 
{E > E, le + le} from BinRel to both PreOrder and Antisymmetry. The 
equivalence classes are then {{F, E, E}, {le, le, le}}, so the colimit spec has one 
type (which we rename E), and one operator (which we rename le). Further- 
more, the axioms of BinRel, Antisymmetry, and PreOrder are each translated to 
become the axioms of the colimit. Thus we have Partial-Order. 

The universal property of the colimit means that there exists a unique speci- 
fication morphism from the constructed Partial-Order specification to any other 
specificaton that refines both PreOrder and Antisymmetry. Intuitively, Partial- 
Order is the simplest specification that composes the logical content of PreOrder 
and Antisymmetry. 
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Although the definitions above are given in higher-order logic, the concepts 
presented below essentially assume a cocomplete category of specifications over 
an institution [9]. 

For purposes of refinement, a loose semantics is natural. Semantics of a re- 
finement morphism is given by a contravariant functor into CAT, the category 
of small categories. That is, each spec denotes a category of models, and each 
morphism denotes a functorial mapping that takes each codomain model into a 
domain model. Particular semantics are enforced by applying appropriate refine- 
ments and when performing the institution morphism from the spec language to 
a programming language. 


3 Composing and Refining Specifications 


Kestrel’s work emphasizes automated tools for the refinement of specifications. 
There are several reasons for taking this approach to software development: 
(1) enhanced productivity through automated code generation, (2) enhanced 
assurance due to the correct-by-construction characteristic of refinement-based 
derivations, and (3) enhanced software quality and performance due to to auto- 
mated application of codified best-practice design knowledge. 

The first step in developing a new software application in Specware is build- 
ing a domain specification and capturing the requirements of the application. 
Composition by colimit plays a major role in building domain specifications. An 
example from scheduling is shown in Figure [I] Generally, scheduling is about 
the allocation of resources to tasks so as to satisfy constraints on timeliness, 
capacity, cost, and so on. In the figure, specifications for Time and Quantity 
are shared between Task (modeling scheduling tasks) and Resource (modeling 
resources to carry out tasks). Quantity is used to model demand in Task and to 
model capacity in Resource. A pushout is also used to instantiate a spec SET of 
finite sets that is parameterized on a base type (called 1-Sort here). The actual 
requirements are expressed by input/output constraints (pre/post-conditions) 
on the scheduler (for more details, see [28]). 

In a refinement setting, a formal specification of system requirements is re- 
fined to code by incrementally adding design detail. Increments of implementa- 
tion detail are expressed as morphisms between specifications (in an appropriate 
category). There is an active community of researchers and practitioners explor- 
ing the issues of building requirement specifications out of the (sometimes con- 
flicting) agendas of various stakeholders. What has been missing in this picture 
is a focus on how to construct refinements — are they mostly ad-hoc, or can they 
be derived from reusable design abstractions? Most approaches to refinement in 
the literature (e.g. VDM, Z, RAISE, B) rely on manual invention of refinements, 
followed, if desired, by verification of the refinement conditions. Our approach, 
implemented in KIDS and Specware/Designware, has hypothesized that most 
code is derived from reusable design abstractions and that these can be codified 
and mechanically applied. A key component of our research has been collecting 
and formalizing principles of excellent design practice, as found in algorithm de- 
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Fig. 1. Scheduling Domain Specification 


sign textbooks and practice, system design patterns/architectures/frameworks, 
and so on. 

The purpose of this paper is to highlight the ways in which colimits, in 
suitable categories, play a central role in composing these sources of information 
with the evolving design in order to mechanically generate refinements. Since the 
colimit is scalably computable in the categories of interest, it can play a central 
role in a refinement-oriented mechanized system development environment. 


4 Design by Classification 


Design knowledge typically has two essential components: its content and a char- 
acterization of situations in which the content applies. We represent these two 
components as the codomain and domain of a morphism, respectively. That is, 
abstract design knowledge about datatype refinement, algorithm design, soft- 
ware architectures, program optimization rules, visualization displays, and so 
on, can be expressed as refinements (i.e. morphisms). The codomain embodies 
a design constraint — the effect is a reduction in the set of possible implementa- 
tions. The domain of one such refinement represents the abstract structure that 
is required in a user’s specification in order to apply the embodied design knowl- 
edge. The codomain of the refinement contains new structures and definitions 
that are composed with the user’s requirement specification. 
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The figure to the left shows the application of a library refine- 
ment A — B to a given specification Speco. First the library 
A—— Speco refinement is selected. The applicability of the refinement to 
Speco is shown by constructing a classification arrow from A to 
Speco which classifies Speco as having A-structure by making 
B— Spec, explicit how Speco has at least the structure of A. Finally the 
refinement is applied by computing the pushout. The colimit 
algorithm generates both the refined specification (the apex 
shown in the lower right) and the cocone morphisms, including 
the refinement morphism Speco — S'pec,. The creative work 
lies in constructing the classification arrow [21]22]. 
Furthermore we can organize the design theories into libraries with a taxonomic 
structure — more general theories refine to more specialized theories. Mechanisms 
for incrementally accessing and applying design theories from such a library as 
discussed in [22]. 
The next two subsections elaborate these notions in the context of datatype 
refinement and algorithm design respectively. 


4.1 Datatype Refinement 


Abstract data types (ADTs) allow us to think about data structures in terms of 
their essential operations and properties. To work effectively with ADTs we must 
add back in the implementation detail that is abstracted away in a, say, algebraic 
presentation of an ADT. Refinements serve this purpose. Specifically a morphism 
between an ADT theory and a (more concrete) datatype theory presents a way 
to implement the ADT (or dually, a way to view the implementing datatype as 
the ADT). 

Some specific examples includes finite set theory mapping to lists or B-trees 
or splay trees. Another example: finite sets over a small finite type mapping to 
hash tables or bit vectors. 


Each of these refinements/interpretations can be 


Abstract DT ——> Spec 
represented and stored in a library. To apply a ae td 
datatype refinement we compute the following | | 
pushout: 


ConcreteDT ——~> Spec, 
We sketch a simple example to illustrate the representation of abstract design 
knowledge as morphisms. In particular, we can refine finite sets to bit vectors as 
follows. Finite sets over the range 1..32 are partially specified by 


spec FiniteSet is 


type FSet 

type Elt = 1..32 % range from 1 to 32 
op {} : FSet % empty set 

op -with _: FSet x Elt — FSet % add an element 
op - U _: FSet x FSet — FSet % union 


op - _: FSet x FSet — FSet % intersection 
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axiom commutativity is ((S with a) with b) = ((S with b) with a) 
axiom idempotence is ((S with a) with a) = (S with a) 
end-spec 
Bit vectors of length 32 are partially specified by 


spec Bit Vector32 is 


type BV32 

type Index = 1..32 

op zero: BV32 % the zero bit vector 
op set: BV32 x Index — BV32 % set index bit to 1 
op -| -: BV32 x BV32 — BV32 % bitwise OR 

op _& _: BV32 x BV32 — BV32 % bitwise AND 

op. << _: BV32 x Index — BV32 % left shift 

end-spec 


A refinement of FiniteSet to BitVector32 is presented by the morphism 


morphism FSet-to-Bit Vector82 is 


{ FSet => BV32 
Elt œ> Index 
{} r+ zero 
with > set 
U => | 
N > & 

} 


If we have a specification S that imports FSet, then taking a pushout of 
FSet-to-Bit Vector32 with the import morphism FSet — S yields a refinement 
of S in which finite sets are implemented as bit vectors. 

As another example, in Specware, the splay tree refinement is the default 
implementation given to sets due to its good performance profile. Programmers 
might tempted to avoid working with splay trees since their implementation is 
a little more complex than simpler representations of sets. A refinement set- 
ting allows developers to work with appropriate abstractions and obtain good 
performance. 


4.2 Algorithm Design 


Just as an algebraic presentation of a datatype aims to capture the abstract 
essence of the type, an algorithm theory aims to capture the abstract essence of 
a class of algorithms [27]. For example, consider the class of greedy algorithms 
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(which work to build a solution by iteratively adding the best available compo- 
nent to the incremental solution, until no more components remain). The greedy 
algorithm can be abstractly represented by a program scheme, which is a defini- 
tion in a theory that contains partially specified function symbols. A sufficient 
condition that the scheme generates an optimal solution is given by the matroid 
property (which is comprised of four conditions; see e.g. [15]). We can represent 
this package (sufficient structure plus program scheme) as a morphism, prove it 
once, and store it in a library. 

A pushout can be used to apply such an algorithm refinement to a particular 
problem. For example, the problem of finding a minimum spanning tree can be 
solved by applying the greedy algorithm theory, yielding Kruskal’s algorithm (or 
Prim’s algorithm depending how on the classification arrow is constructed). 


Matroid Conditions ——————> M ST 


| | 


Greedy Scheme ———> Kruskal Algorithm 


This approach to automated algorithm design was first implemented in the 
KIDS system [20] and more clearly in Specware/Designware [24[23]. A series 
of complex high-performance scheduling algorithms for Air Force applications 
were developed using this approach in KIDS [28] and a domain-specific variant 
of Designware called Planware [I]. 


5 Policy Enforcement 


The previous two sections describe a means for applying abstract design knowl- 
edge to generate algorithms. When we turn to system design, there are issues to 
contend with that arise less obviously in algorithm design. In particular, cross- 
cutting concerns are one source of the extra complexity that arises in system 
design. A concern is cross-cutting if its manifestation cuts across the dominant 
hierarchical structure of a program. Cross-cutting concerns explain a significant 
fraction of the code volume and interdependencies of a system. The interdepen- 
dencies complicate the understanding, development, and evolution of the system. 

In this section, we illustrate two forms of cross-cutting concerns and how they 
can be expressed and mechanically enforced. We call these concerns policies to 
emphasize that (1) they are really requirements, and (2) they tend to reflect 
non-functional concerns, such as auditing, security, and so on. 

The following colimit shows the intention of our approach: to use a colimit 
in a suitably defined category to enforce policies on a system design. 


Shared Structure ———~> System 


System with 
Enforced Policy 


Policy —————> 
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One issue that arises in this context is knowing where the policy applies. 
For example, a security policy must be applied pervasively in order to provide 
assurance. In our approach, static analysis is used to find all occurrences 
and to set up the cospan (i.e. the Shared Structure specification above). 


5.1 AOP as Invariant Maintenance 


A simple example of a cross-cutting concern is an error logging policy — the 
requirement to log all errors in a system in a standard format. Error logging 
necessitates the addition of code that is distributed throughout the system code, 
even though the concept is easy to state in itself. 

Aspect-oriented programming (AOP), as exemplified by AspectJ [13], pro- 
vides a modular way to treat cross-cutting concerns. However, AspectJ aspects 
are expressed at a programming language level which obscures their intention. 
The reason for this, of course, is to lower the barriers to usage amongst the broad 
Java programming community. In [25] we proposed some techniques for specify- 
ing cross-cutting concerns as logical invariants to be maintained. For example, to 
express an error-logging policy as an invariant, we assert that the error-log data 
structure is equal to the list of all previous errors that have occurred during the 
course of the computation. To formalize this invariant, we need to reify the his- 
tory of the computation, purely for specification purposes [25]. The counterpart 
to aspect weaving is (1) to use static analysis to find all code locations where the 
invariant might be violated, and (2) to specify and synthesize code to reestab- 
lish the invariant. For the error-logging example, static analysis would find all 
potential code locations where an error might be thrown, and the composition 
process would compose the throw with an update of the error-log data structure. 

By expressing cross-cutting concerns as invariants, we capture their intention 
more clearly and we can use algorithmic means (static analysis) to determine 
the complete extent of their application, in contrast to the manual coding of join 
points in AspectJ. 

Our point here is that one of the key mechanisms underlying the enforce- 
ment of an invariant is a suitable pushout. To see this most clearly, we switch 
to a category of abstract state machines over a suitable specification language; 
e.g. see especs in [16J17|. Here the objects are state machines and the mor- 
phisms/refinements represent the simulates relation between automata. An ab- 
stract state is given by a specification (for especs we use the higher-order spec- 
ifications of Specware). For our purposes here, an abstract transition will be 


? na 7 [Pre, Post] 
specified by a pre/post-condition pair: A- -= ~- — >B (we use dashed ar- 


rows for transitions to distinguish them from morphisms in a diagram). 

In a category of state machine, particularly especs, refinement means simu- 
lation and colimit serves (1) to compose the corresponding state specifications 
(the pushout of A and C is denoted A $ C) and (2) to superpose the actions 
on abstract transitions (the pushout of actions effectively conjoins their effects 
so the composite action achieves both simultaneously). The following diagram 
illustrates the composition of one step of the source system A- -> B witha 
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step of the policy © — — > D . The policy asserts that J is to hold invariantly 
at states and the effect of composition is to add the invariance requirement to 
the system. 


@e-------— = 0 
m aa ee 
AOC >BOD 





[PreAI,PostA]1] 


Static analysis is used to find the association of system steps and policy steps, 
then a pushout, as above, is used to compose the two. A further synthesis step 
is needed in order to synthesize an action that achieves the composed action 
specification [Pre A I, Post A I]. For a variety of detailed examples see ‘ 


5.2 Enforcing Automata-Based Security Policies 


The previous section described a simple kind of policy, based on invariant state 
properties, and the composition and synthesis mechanisms that underlie enforce- 
ment. A more general kind of policy can be specified by means of automata or 
by temporal logic formulas. 

As a concrete example, consider the following simple security policy which 
is adapted from Schneider [19]. Whenever a process reads from a particular file 
f, it is not allowed to send any messages. The policy states a particular kind of 
information flow constraint. The policy can be expressed as a policy automaton: 


a: aread f/ qg: asend/ 


y 


Ys ` 
/ / 
B: read f/ 6: send/abort 


The transitions are labeled in the form name : event/action. The events are 
expressed as source-code patterns that either succeed (with bindings of pattern 
variables) or fail. If an action is omitted, then it is a no-op. This policy has only 
has one prescription of an action to take in a particular context — in policy state 
1, if a send is attempted, then abort the program. The effect of enforcement 
will be to terminate any behaviors that do not implement the policy (a send 
following a read of file f). For examples of the enforcement of automata-based 
policies that prescribe behavior, see the error-handling policies in [26]. 

Colimits can be used to enforce policies specified by a policy automaton. 
However, there are interesting issues that arise. The foremost is that the effect 
of enforcing this policy is to sometimes cause the program to abort (terminate 
abnormally) when the system would otherwise continue normally. There are two 
problems here: (1) how to handle conflicting constraints on the system (here the 
system may satisfy constraints that conflict with the policy), and (2) how to 








Composition by Colimit and Formal Software Development 327 


define an appropriate notion of refinement (morphism) that allows termination 
of behaviors. 

One approach to handling conflicting requirements is to treat system require- 
ments as having a linear priority order. The idea is that a system satisfies a prior- 
ity ordering of constraints if whenever the system fails to satisfy one constraint 
C, then it must satisfy some other higher-priority constraint. For example, it 
is often the case in system code that safety and security constraints dominate 
functional constraints. We make this approach more precise in the following. 

Let (R, <) be a linearly ordered set of temporal formulas [14], and S' a pro- 
gram. We say that a behavior b satisfies R if for each formula F in R, either S 
satisfies F' or it satisfies some other formula G € R such that F < G. S satisfies 
R if every behavior of S satisfies R. 

Technically, there is no extra expressive power in priority-ordered require- 
ments. Consider the simplest situation in which there are just two requirements 
A and B together with the order A < B. An equivalent specification has the two 
requirements AV B and B without an order. Clearly this notion of satisfaction is 
weak, since it admits programs that satisfy B but not A. While one could pursue 
this to obtain a stronger theoretical definition of satisfaction (e.g. by considering 
maximal satisfaction of dominated constraints), we take a pragmatic approach 
that addresses the problem via the design process. That is, our approach will 
be to perform design starting with the bottommost requirements of the order — 
typically these are the basic functional constraints. Then, we iteratively select 
dominating requirements in order and enforce them by colimit in the evolving 
design. In this way, whenever we enforce a requirement, the composition process 
will only override dominated constraints. The result is a design that will tend to 
satisfy the base functionality requirements as much as possible, but with some 
behaviors that accord with overriding policy constraints. 

Mobile code provides a clear scenario in which this bottom-up design ap- 
proach makes sense. Mobile code typically cannot be designed to anticipate all 
environments that it might run in. One host environment may have local policies 
that must be enforced, and it can do so by, say, composing the policies at the 
byte-code level at upload time. This way, the local environment’s policies are 
maintained even if it means disallowing behaviors of the mobile code that might 
be acceptable in other environments. 

Our point here is not to fully define a new approach to program satisfaction, 
nor a new design methodology, but simply to show another context in which 
composition by colimit provides basic support to system development. 

The second problem mentioned above, a suitable notion of refinement that 
allows behavior termination, can be addressed as follows. In the category of es- 
pecs [I6]17], abstract states are given by specifications and abstract transitions 
are modeled by suitable morphisms — a state machine is then a diagram over a 
category of specs. Each abstract state naturally has the identity self-transition 
which is the identity morphism on specs. Semantically, the behaviors of such an 
espec includes arbitrary stuttering (no-op transitions that do not change state). 
In the literature, behaviors that stutter are often ruled out, although they play 
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a crucial role in refinement. We propose to go farther and admit all such stutter- 
ing behaviors, including behaviors in which the machine stutters forever on some 
state. There are at least two reasons to adopt this rather loose semantics. First, 
it allows us to model failure in the underlying computation substrate. Most for- 
mal models of behavior assume a perfect computational model and ignore the 
unreliability of the hardware/software platform on which software executes. Sec- 
ond, it allows us to treat as refinement the notion of policy enforcement that 
works by terminating bad behaviors. In both cases, the idea is that for any 
behavior that successfully reaches a final/accepting state (or does so infinitely 
often), the semantics also includes all prefixes of that behavior. Each proper 
prefix corresponds to a computation that is terminated (due to failure of com- 
putational service, or to policing action, etc.). As a consequence, we obtain the 
conventional notion of trace-containment semantics for refinement. That is, every 
behavior of the codomain machine (including abnormally terminated behaviors) 
maps-to/simulates a behavior of the domain machine. 

Enforcement of a policy automaton occurs in two stages. In the first stage, 
static analysis is used to simulate the automaton by matching the event patterns 
against the control-flow of the system source code. Recent progress has pro- 
duced scalable low-order polynomial time algorithms for policy simulation [1016]. 
These algorithms work by simulating the policy forward through the source code, 
recording the policy states and transitions in labels on the control-flow graph 
of the source code. When matching a policy transition labeled event/action, if 
the event pattern matches a source-code transition, then the policy transition 
(instantiated with the bindings from the match) is associated with the source 
transition. The algorithms terminate when a fixpoint is reached. 

In effect, static analysis creates a refinement of the policy automaton that has 
the same essential shape as the source code, thus enabling automatic composition 
by colimit. 

Consider for example the code 

int cC; 
if c=0 then read f; 
send m; 


which is represented by a state machine in Figure [2] The figure also shows the 
results of policy simulation/analysis — each state of the code is labeled with 
the states of the policy automaton that it could possibly be in for some input, 
and each transition is labeled with the set of possible policy transitions that it 
simulates for some input. 

Conceptually, the static analysis sets up a cospan in the category of especs 
[16]17]. Figure [3] shows both the cospan and the cocone. The static analysis al- 
lows us to set up a refinement of the policy automaton (shown on the right of 
the cospan) and the abstract shape that is common to the source code and the 
policy instance. In the example, the key feature is the policy ambiguity that 
results from the conditional: after the conditional, the system state is in either 
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Fig. 2. Results of Static Analysis 
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Fig. 3. Colimit to Enforce Policy 
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policy state 0 or 1 depending on which branch was taken. Crucially then, the 
send command is either (i) acceptable, if the policy state is 0, or (ii) forbidden 
if the policy state is 1. The policy instance automaton reflects this by recording 
the policy transitions that correspond to system transitions. 

Computing the pushout has the essential effect of enforcing the security pol- 
icy in the source code. Finally, program synthesis processes are applied to the 
pushout specification and the result is translated back to the following source- 
level code: 


int c: 
int s; /* state variable */ 
s := 0; 
if c=0 

then {read f || s := 1} 
if s=0 


then send m 
else abort; 


In the example, the composition results in the code aborting when c = 0. 
The pushout object is a refinement of both the policy and the source code. 

The approach outlined above applies to a given software design, and has the 
effect of aborting behaviors that are forbidden by policy. while we can formulate 
this process in terms of pushouts in an appropriate category, there are pros and 
cons to this approach. It makes sense to use this approach with code of unknown 
provenance that must be made to conform to local policies (e.g. mobile code or 
services supplied over the Internet). However for bespoke code, the framework 
gives the developer too much freedom — it doesn’t provide incentives for the 
programmer to find ways to satisfy both the functional requirements as well 
as safety and security policies. Our view is that good designers will develop an 
architecture that supports for the kinds of policies that can be expected for the 
system. The effect then of policy enforcement would be to add in the details of 
the policy to the appropriate architectural mechanisms. A good example is access 
control. There are standard architectures for access control [80] that prescribe the 
mediation of a guard in any access to a resource that requires some protection. 
The design pattern puts the requisite structure in place and the colimit composes 
in the policy details. 


6 Concluding Remarks 


Our goal has been to show how composition by colimit can play a fundamental 
role in software development by refinement. The benefits of these foundations 
include enhanced productivity through automated code generation, enhanced 
assurance due to the correct-by-construction characteristic of refinement-based 
derivations, and potentially enhanced software quality and performance due to 
to automated application of codified best-practice design knowledge. 
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Abstract. The COL institution (constructor-based observational logic) 
has been introduced as a formal framework to specify both generation- 
and observation-oriented properties of software systems. In this paper 
we consider behavioral refinement relations between COL-specifications 
taking into account implementation constructions. We propose a general 
strategy for proving the correctness of such refinements by reduction to 
(standard) first-order theorem proving with induction. Technically our 
strategy relies on appropriate proof rules and on a lifting construction to 
encode the reachability and observability notions of the COL institution. 


1 Introduction 


Within the theory of algebraic specifications, behavioral (or observational) as- 
pects of software systems have been considered since more than twenty years in 
many approaches in the literature. One of the first studies exposing the impor- 
tance of a behavioral view for the formalization of implementation notions has 
been provided by Goguen and Meseguer in [11]. It is motivated by many ex- 
amples which show that it is essential to abstract from internal implementation 
details and to rely only on the observable behavior of programs. 

As discussed in [7], behavioral refinement concepts can be classified into two 
principal trends. The first one, pursued e.g. in [18, 19, 17,3], uses an explicit be- 
havioral abstraction operator to relax the standard model class semantics of the 
specification to be implemented. The second one uses specifications with built-in 
features to express behavioral properties. Examples are the hidden algebra insti- 
tution developed by Goguen and his research group (see e.g. {12]), the CafeOBJ 
language [9] and the COL institution (constructor-based observational logic [4]). 
Each of these approaches is equipped with a notion of signature containing a 
distinguished set of observer operations (to build observable experiments) and 
with a notion of behavioral satisfaction such that the equality symbol is in- 
terpreted by the observational equality of elements (where two elements of an 
algebra are observationally equal if they cannot be distinguished by observable 
experiments). In the COL institution signatures contain additionally to the ob- 
servers a distinguished set of constructor operations which specify those elements 
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which are of interest from the user’s point of view thus determining a subpart 
of an algebra (called the contructor-generated part). Hence a COL-signature 
Scot = (Y,OPcons,; Pops) consists of a (standard) many-sorted signature 
X = (S,OP) together with distinguished sets OPcons of constructor operations 
and OPops of observer operations. The behavioral satisfaction of formulas is 
then further relaxed to the COL-satisfaction relation soo, which takes into 
account only constructor-generated values for the valuation of variables (thus 
abstracting from junk values). 

A simple refinement relation between COL-specifications can be defined if 
the specification SPcoy, to be implemented and the implementing specification 
SPlIcor have the same COL-signature XcoL. In this case SPIcoy is a behavioral 
refinement of SPcor if its model class Mod|[SPIcoz] is included in the model 
class Mod |SPcoL] of SPcox. To prove the correctness of the refinement one has 
to show, assuming that SPcor is a (flat) specification of the form (XcoL, Ax), 
that Mod[SPIcoL] Esco, AX; i.e that all models of SPIcoy behaviorally satisfy 
the axioms of the abstract specification SPcoy. For this purpose one can directly 
apply the proof techniques for behavioral consequences of COL-specifications de- 
veloped in [4]. In general, however, the assumption that both specifications have 
the same signature is much too restrictive because an implementation usually 
involves some construction steps which led to the concept of a constructor im- 
plementation introduced in [19]. In the context of the COL institution this idea 
is formalized by the notion of a COL-implementation constructor kco which 
can be applied to the models of the implementing specification SPIcoz to pro- 
duce models of the specification SPcor to be implemented. Hence, to prove the 
correctness of the refinement one has to show 








(x) kcor(Mod[SPIcoL]) T Yoon Ax 





with SPcor = (XcoL, Ax) as above. Unfortunatley there is no obvious way for 
discharging this proof obligation since we cannot expect that Kcoz is compatible 
with COL-satisfaction, i.e. we cannot reduce the proof to Mod [SPIcoL] Eicon 
Ax* (with an appropriate syntactic adjustment Ax* of Ax and YJcoz being 
the signature of SPIco.). For instance if we consider an implementation of sets 
by lists, sco would be the (standard) reduct functor along a (standard) sig- 
nature morphism that would not preserve the usual observer operations where 
sets are observed by the membership test isin and lists are observed by head 
and tail. Hence, the reduct used for the implementation construction would not 
be compatible with the COL-satisfaction relations for sets and lists resp. These 
considerations are in accordance with Goguen’s and Malcolm’s study on the dif- 
ference between vertical signature morphisms used for refinements and horizon- 
tal signature morphisms used for modular constructions of system specifications; 
see [15]. 

In this paper we propose a strategy to discharge the proof obligation (*) which 
consists of two major steps. In the first step (see Section 4) we show that instead 
of the COL-specification SPIcoz used for the implementation it is sufficient to 
consider the (standard) first-order specification SPI obtained from SPIco, by 
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forgetting the observer and constructor opertions. For the correctness of the 
corresponding proof rule it is essential that COL-implementation constructors 
must preserve observational equivalences between algebras (a property which is 
strongly related to Schoett’s notion of stability; see [20]). 

As a consequence of the first step it remains to show that the (standard) 
models of SPI behaviorally satisfy the axioms Ax of the specification SPcor 
to be implemented. Therefore, in the next step (see Section 5), we investigate 
how the proof of behavioral consequences (w.r.t. Ego,) of an arbitrary class 
of X-algebras can be reduced to standard first-order reasoning (plus induction). 
Technically this is achieved by a “lifting” construction providing an appropriate 
axiomatization of observational equalities and generated parts. Our proof tech- 
niques are illustrated by an example (see Section 6) considering a behavioral 
refinement of sets by non-redundant lists. 





2 Basic Concepts 


In this section we summarize the basic concepts that are needed to study be- 
havioral refinements of COL-specifications and corresponding proof techniques. 


2.1 Algebraic Preliminaries 


We assume that the reader is familiar with the basic notions of algebraic spec- 
ifications (see, e.g., [22, 14, 1]), like the notions of (many-sorted) signature X = 
(S,OP) (where S is a set of sorts and OP is a set of operation symbols op : 
$1,---,5n — S$), signature morphism o : X — 3S”, (total) X-algebra A = 
((As)ses; (0p) opeop), X-term algebra Ts(X) over a family X = (Xs)ses of 
pairwise disjoint sets X, of variables of sort s and interpretation Ia : Ts(X) —> A 
w.r.t. a valuation a: X — A. The class of all X-algebras is denoted by Alg(~’). 
Together with ©-morphisms this class forms a category which, for simplicity, is 
also denoted by Alg(5’). For any signature morphism o : X — X’, the reduct 
functor _|, : Alg(2”) — Alg(Z) is defined as usual and the reduct of a X'- 
algebra A w.r.t. ø is denoted by A|,. In particular, the reduct of A to a subsig- 
nature X C X” is denoted by Aly. In the following we assume that signatures 
are finite. 

The notion of an institution was introduced by Goguen and Burstall [13] 
to formalize the general concept of a logical system from a model-theoretic 
point of view; see [21] for an overview. An important example is the institution 
FOLEgq of many-sorted first-order logic with equality as detailed, e.g., in [2]. 
In FOLEgq signatures are many-sorted signatures, models are X-algebras and 
sentences are arbitray first-order ©-formulas. The satisfaction of a first-order 
X-formula y by a X-algebra A, denoted by A } 4, is defined as usual in the 
first-order predicate calculus with equality. The notation A = y is extended ina 
straightforward way to classes of algebras and sets of formulas. The institution 
CFOLEg is an extension of the FOLEq institution where, in addition to first- 
order sentences, we consider as extra sentences sort-generation constraints of 
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the form SGC(Scons,; OPcons). A X-algebra A satisfies a sort-generation con- 
straint SGC(Scons; OPcons) if it is reachable w.r.t. OPcons, i.e. if each ele- 
ment of a carrier set A, with constrained sort s € Scons can be constructed 
by the interpretations of the constructors OPcons starting from constants and 
from arbitrary elements of non-constrained sorts, if any. It is well-known that 
a free sort-generation constraint is just an abbreviation for the corresponding 
sort-generation constraint plus a finite set of first-order sentences to state that 
all distinct constructor terms (up to variable renaming) denote distinct values. 
Therefore, in the following, we will also assume that the CFOLEgq institution is 
equipped with free sort-generation constraints of the form FSGC (Scons, OP cons); 
with the meaning described above (see [16, pp. 152-153]). 

Any institution provides a suitable framework to define specifications. The se- 
mantics of a specification SP is determined by its signature, denoted by Sig[SP], 
and by its class of models, denoted by Mod[SP]. In this paper we will only con- 
sider basic specifications (X, Ax) consisting of a signature X and a set Ax of 


5/-sentences, also called the axioms of the specification, with semantics: 
Sigl(Z,Ax)] SZ 
Mod((5, Ax)] £ {M € Mod(Z) | M Ey Ax}. 





Notations. If SP is a specification and ¢ is a Sig/SP]-sentence, we write SP H y 
for Mod[SP] = y and similarly for sets of Sig|SP]-sentences. In the context 
of the CFOLEgq institution we will also consider the sum SP; + SP2 of two 
specifications SP; and SP2 with semantics: 

Sig[SP1 + SP2] “= Sig[SP1] U Sig[SP2] 


Mod|SP, + SP2] “ {A € Alg(Sig[SP1] U Sig[SP2]) | 


AlsigisP, € Mod [SP] and A| Sig[SP2] € Mod[|SP2]}. 
By analogy, for any class C of X-algebras and any specification SP, we denote 


by C + SP the class of X U Sig[SP]-algebras defined by: 


C +SP “ {A € Alg(¥U Sig[SP]) | Aly € C and Alsigisp) € Mod[SP]}. 











2.2 A Brief Introduction to the Constructor-Based Observational 
Logic COL 


The COL institution has been introduced as a formal framework to capture the 
observational aspects of system specifications; see [4]. The basic idea is to con- 
sider distinguished sets of constructor and observer operations. Intuitively, the 
constructor operations determine those elements which are of interest from the 
user’s point of view while the observer operations determine a set of observ- 
able experiments that a user can perform to examine hidden states. Thus we 
can abstract from junk elements and also from concrete state representations 
whereby two states are considered to be “observationally equal” if they cannot 
be distinguished by observable experiments. 

Formally, a constructor operation is an operation symbol cons : s1,...,Sn — 
s with n > 0. The result sort s of cons is called a constrained sort. An observer 
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operation is a pair (obs, i) where obs is an operation symbol obs : s1,..., Sn > S 
with n > 1 and 1 < ¿i < n. The distinguished argument sort s; of obs is called a 
state sort (or hidden sort). If obs : sı —> s is a unary observer we simply write 
obs instead of (obs,1). A COL-signature Scot = (X, OPcons, OPobs) consists 
of a standard many-sorted signature X = (S, OP) together with a distinguished 
set OPcons C OP of constructor operations and a distinguished set OPops of 
observer operations (obs, i) with obs € OP. We implicitly assume in the following 
that whenever we consider a COL-signature Sco, the underlying (standard) 
signature is X and similarly for Xboq etc. 


The set Scons E S of constrained sorts (w.r.t. OPcons) consists of all sorts 
s such that there exists at least one constructor in OPcons with range s. The 
set SLoose C S of loose sorts consists of all non-constrained sorts, i.e. SLoose = 
S\Scons- The set Sstate C S of state sorts (or hidden sorts, w.r.t. OPops) consists 


of all sorts s; such that there exists at least one observer (obs, i) in OPops , obs : 


81, +++, Si, -- -Sn — S. The set Sops C S of observable sorts consists of all sorts 
which are not a state sort, i.e. Sops = S \ Sstate. An observer (o0bs,i) E€ OPobs, 
0S : S1,..., Si,- -, Sn — S is called a direct observer if s E Sons, otherwise it is 


an indirect observer. 


The set OPcons of constructor operations (of a COL-signature XcoL) de- 
termines a set of constructor terms. A constructor term is a term ¢ of a con- 
strained sort s E€ Scons which is built only from constructor operations of OP cons 
and from variables of loose sorts. In particular, if all sorts are constrained, i.e., 
Scons = S, the constructor terms are exactly the (S,OPcons)-ground terms 
which are built by the constructor symbols. The set of constructor terms deter- 
mines, for any /-algebra A, an S-sorted family of subsets of the carrier sets of 
A, called the generated part and denoted by Gen sco: (A). For each constrained 
sort $s E€ Scons, the corresponding subset Gens,o,(A)s C As consists of those 
elements that can be constructed by the interpretations of the given constructors 
(starting from constants and from arbitrary elements of loose sorts, if any). For 
each loose sort s E€ Spoose; GEN Scor (A)s = As. The Ycoi-generated part repre- 
sents those elements which are of interest from the user’s point of view according 
to the given constructor operations. A X-algebra A is reachable (w.r.t. XcoL) if 
its carrier sets coincide with its XcoLr-generated part. 


The set OPops of observer operations (of a COL-signature XcoL) determines 
a set of observable contexts which represent the observable experiments that a 
user can perform. Observable contexts are defined in a coinductive style which 
will be reflected in the encoding of observable contexts in Section 5. 


Definition 1 (Observable context). Let Sco, be a COL-signature, let X = 
(Xs)ses be a family of pairwise disjoint, countably infinite sets Xs of variables 
of sort s and let Z = ({zs})seSstare be a disjoint family of singleton sets (one 
for each state sort). The sets C(XcoL)s—=s of observable XcoL-contexts with 
“application sort” s and “observable result sort” s', with s © Sstate and s’ € 
Sops, are the least sets such that: 
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1. For each direct observer (obs,i) with obs : s1,...,Si,...,Sn — 8’ and pair- 
wise disjoint variables %1:81,...,2n:Sn, 
obs(£1,..., Zi—1, 2s;5 Zi¢1,---,2n) E C(XcoL)s; >s - 

2. For each observable context c € C(XcoL)s—>s, for each indirect observer 
(obs,i) with obs : 81,...,8i,---,8n — s, and pairwise disjoint variables 
©1381, ...,XniSn not occurring in c, 
clobs (x1, wey Ui-1, Zsis Tiple- ,&n)/Zs| E€ C(XcoL)si >s’ 
where c|obs(z£1,...,Zi—1, 2s;;Vit1,---;Un)/2Zs] denotes the term obtained 
from c by substituting the term obs(x1,...,Li—1, 2s;;Vit1,---;Un) for Zs. 


We assume that for any state sort s E€ Sstate there exists an observable context 
with application sort s. 


The set of observable contexts determines, for any X-algebra A, an indistin- 
guishability relation, called observational equality. The observational equality on 
A is an S-sorted binary relation ~ x ¢6,,4 such that for any two elements a,b € A, 
a &Sooz,A b holds if either a = b and a,b are observable (i.e. belong to a carrier 
set of observable sort s E€ Sops) or if a and b are hidden (i.e. belong to a carrier 
set of a state sort s E€ Sgtate) but cannot be distinguished by the application 
of observable contexts. The application of observable contexts is defined in the 
usual way apart from the fact that for variables in X (occurring in an observable 
context c) we consider only valuations in the generated part Gensg,,(A) (ie. 
junk values are disregarded because they should not contribute to distinguish 
elements). A X-algebra A is fully abstract if the observational equality coincides 
(on all carrier sets) with the set-theoretic equality. 

The constructor and the observer operations induce certain constraints on 
\-algebras. First, since the constructor operations determine the values of inter- 
est, we require that the non-constructor operations should (up to observational 
equality) respect the constructor-generated part of an algebra, i.e. by the ap- 
plication of non-constructor operations one should at most be able to obtain 
elements which are observationally equal to some element of the constructor- 
generated part Gensco, (A). Technically this means that for a given X-algebra 
A we first consider the smallest X-subalgebra (Geny,o,,(A))s of A containing 
the Xcor-generated part because this subalgebra represents the only elements 
a user can compute (over the loose carrier sets) by invoking operations of X. 
Then we require that each element of (Gen scoL(A4))s is observationally equal 
to some element of the Xcor-generated part Genygo,(A) of A. This condition 
is called reachability constraint. 

Furthermore, since the declaration of observer operations determines a par- 
ticular observational equality on any X-algebra A, the (interpretations of the) 
non-observer operations should respect this observational equality, i.e. a non- 
observer operation should not contribute to distinguish non-observable elements. 
To ensure this we require that the observational equality is a X-congruence on 
the subalgebra (Geny¢o,(A))s. (It is sufficient to consider (Gen y¢6,,(A)) 5 in- 
stead of A because computations performed by a user can only lead to elements 
in the X-subalgebra (Gens,,,(A)) x.) This condition is called observability con- 
straint. 


Proving Behavioral Refinements of COL-specifications 339 


A X-algebra A which satisfies both the reachability and the observability con- 
straints induced by a COL-signature cor is called a Xcor-algebra (or simply a 
COL-algebra). Obviously any ¥’-algebra A which is reachable and fully abstract 
w.r.t. XcoL is a Ycoy-algebra. The class of all X’coz-algebras is denoted by 
Algcor (cox). It can be extended to a category by an appropriate notion of 
Sicot-morphism which reflects behavioral relationships between ’coy-algebras 
(see [4] for details). 

The satisfaction of the reachability and observability constraints allows us 
to construct for each XcoL-algebra A its black box view which is a reachable 
and fully abstract algebra representing the behavior of A from the user’s point 
of view. The black box view is constructed in two steps. First, we restrict to 
the XcoLr-generated subalgebra (Gens,,,(A))s of A thus forgetting junk val- 
ues. Then, we identify all elements of (Gens,,,(A)) s which are observationally 
equal. Hence the black box view of a XcoL-algebra A is given by the quotient 
algebra of (Gen sooL(A)) 5 w.r.t. &5Go,,4 Which, for simplicity, will be denoted 
by A/®soor,4- Two Ycor-algebras A and B are observationally equivalent, de- 
noted by A =xoo, B, if their black box views A/* x o,,4 and B/*xoo,,B are 
isomorphic X-algebras. Observationally equivalent X’coy-algebras are isomor- 
phic w.r.t. Xcor-morphisms (see [4]). 

A crucial concept to obtain a built-in behavioral semantics for specifications is 
the COL-satisfaction relation, denoted by = soo, which generalizes the standard 
satisfaction relation of first-order logic by abstracting with respect to reachability 
and observability. First, from the reachability point of view, the valuations of 
variables are restricted to the elements of the Ycor-generated part only. From 
the observability point of view, the idea is to interpret the equality symbol “=” 
occurring in a first-order formula y not by the set-theoretic equality but by the 
observational equality of elements. 





Definition 2 (COL-satisfaction relation). For any COL-signature ‘con, 
the COL-satisfaction relation between X-algebras and first-order X-formulas 
(with variables in X) is denoted by Exgo, and defined as follows. Let A € 


Alg( X). 


1. For any two terms t,r € Ts(X)s of the same sort s and for any valuation 
a: X > Gensco (A), A, a Esco t= r holds if Ialt) &5co,,A Ialr). 

2. For any X-formula p and for any valuation a : X — Genysgo,(A), 
A,a Esco, Y is defined by induction over the structure of the formula p 
in the usual way. In particular, A,a Esco, Vz:s.ẹp if for all valuations 
p : X > Gensco (A) with Bly) = aly) for all y # x, A,B Fxcon ¥- 

3. For any X-formula p, A Escou Y holds if for all valuations a : X — 
Genysgo,(A), A, a Exon Y holds. 

















The notation A F545, y is extended in the usual way to classes of algebras 
and sets of formulas. The COL-satisfaction relation is defined not only for XcoLr- 
algebras but also for arbitrary X-algebras which will be important when we 
consider proof techniques for behavioral refinement relations. 
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Fact 1 Let co, be a COL-signature, let p be a X-formula and let A be a 
XcoL-algebra. Then: 








AF xcor Y if and only if A/F 5001,4 E Y. 


The above definitions provide the basic ingredients that lead to the COL 
institution. In particular, the COL-satisfaction relation satisfies the satisfaction 
condition of institutions w.r.t. COL-signature morphisms which are standard 
signature morphisms fulfilling additional properties related to the preservation 
of constructor and observer operations (see [4] for details). A basic COL specifi- 
cation SPcor = (XcoL, Ax) consists of a COL-signature Xcor and a set Ax of 
X-sentences (the axioms of the specification). The semantics of SPcox is given 
by its signature XcoLr and by its class of models: 


Mod[SPcot] = {4 € AlgcoL(¥coL) | A soor Ax}. 





3 Behavioral Refinements 


Generally, specification refinement is a relation between an abstract specifica- 
tion to be implemented and a more concrete specification which satisfies the 
requirements of the given abstract specification. Taking into account the observ- 
able behavior described by COL-specifications, a COL-specification SPIcorz is 
considered as a behavioral refinement of a COL-specification SPcor if SPIcoL 
respects the behavioral properties required by SPcoz. Formally, a simple be- 
havioral refinement relation between two COL-specifications can be defined by 
requiring that both specifications have the same signature and that the model 
class of the implementing specification SPIcoz is included in the model class of 
SPcoL. Remember that for the sake of simplicity we restrict to basic specifica- 
tions in the framework of this paper. 


Definition 3 (Behavioral refinement: simple case). 
Let SPcou = (XcoL, Ax) and SPIcoL = (XcoL, Axl) be two COL-specifications 


with the same signature cor. SPIcor is a behavioral refinement of SPcot, 
denoted by SPcoy ~ SPIcot, if 


Mod|SPIcoL] E Mod[SP cot]. 


To prove that SPcor, ~> SPIcor holds, one has to show that: 





SPIcot Esco Y for all axioms Y € Ax, 


i.e. that the axioms of SPcoz are observable consequences of SPIcoz. For this 
purpose one can directly apply the proof techniques for COL-specifications stud- 
ied in [4] (since cor is also the signature of SPIcot). 
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In general, however, one has to take into account that an implementation 
involves some construction step, an idea which has been formalized by the no- 
tion of constructor implementation introduced in [19] (and similarly in other 
implementation concepts; see [17,10] for an overview). According to [19] an im- 
plementation constructor is a function which maps algebras over the signature 
of the implementing specification to algebras over the signature of the abstract 
specification. Since it is sufficient if an implementation construction is defined 
on the models of the implementing specification implementation constructors 
are, in general, partial functions. We assume that implementation constructions 
are performed in a uniform way, i.e. preserve isomorphisms. It is obvious that 
the concept of an implementation constructor can be easily transferred to be- 
havioral refinements of COL-specifications. In particular, the requirement that 
isomorphisms are preserved means in the context of the COL institution that 
a COL-implementation constructor preserves COL-isomorphisms, i.e. observa- 
tional equivalences of COL-algebras (see Section 2.2). 


Definition 4 (COL-implementation constructor). Let cor, “Ico, be two 
COL-signatures. A COL-implementation constructor from ico, to Ycor is a 
partial function Kcot : Algcor(XIcot) > Algcoy (coi) which is COL-iso- 
preserving, i.e. for all AI, BI € Algco,(Icot), 

if AI =yigo, BI and kcot(Al) is defined 

then kcoL(BI) is defined and kcoL( AI) =soo, KcoL(BL). 
The definition domain of kco is denoted by Dom(Kcox). 


Using the notion of a COL-implementation constructor we can generalize 
Definition 3 to the case where the abstract and implementing specifications have 
different signatures. 


Definition 5 (Behavioral refinement w.r.t. an implementation con- 
structor). Let SPcoL, SPIcor be two COL-specifications with signatures XcoL, 
XIcoL resp. and let kco be a COL-implementation constructor from X TIcoL 
to XcoL. SPIcor is a behavioral refinement of SPcoy w.r.t. kcoL, denoted by 
DP oor. "oS" SPIcoL, if 


Mod[SPIcoL] E Dom(Kcoi) and kcoL(Mod[SPIcoL]) E Mod[SP cot]. 


As discussed in [7] an important question is, of course, which implementa- 
tion constructors are appropriate for behavioral refinements. As a first approach 
one could simply consider COL-signature morphisms oco : Ycor > X Ico. 
Since COL is an institution, the corresponding COL-reduct functor --locor : 
AlgcoL(%IcoL) > Algcor(2'coL) preserves COL-isomorphisms, i.e. is a COL- 
implementation constructor. Hence it is tempting to consider COL-refinements 
where the syntactic relationship between the specification SPcoz to be imple- 
mented and the implementing specification SPIcoz is established by a COL- 
signature morphism. This approach has, however, a serious drawback because 
the implementing specification SPIcoL usually has constructor and observer op- 
erations OPI cons, OPlops which are unrelated to the constructor and observer 
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operations OPcons, OPops of the specification SPcor, to be implemented. As a 
simple example we consider in Section 6 the implementation of sets by lists where 
the observer for sets is the membership test isin while the observer operations 
for lists are, as usual, the head and tail operations. Hence the COL-specifications 
of sets and lists cannot be related by a COL-signature morphism which would 
require the preservation of constructor and observer operations. This is the rea- 
son why we want to consider standard signature morphisms and their reduct 
functors as implementation constructors for COL-specifications. 

But before let us still point out that our viewpoint has been inspired by 
the following remarkable sentences by Goguen and Malcolm [15]: “Signature 
morphisms perform two distinct roles. One role is to express the importation of 
one specification into another. .. referred to as horizontal composition... so that 
when a specification of a class of objects is imported into a larger specification, the 
properties of the imported object classes are preserved. The other role performed 
by signature morphisms is to compare two different specifications. This is referred 
to as vertical composition, and pertains to relationships between layers... In such 
a case we would not expect that signature morphisms encapsulate object class 
specifications, but rather expect that signature morphisms preserve the behaviour 
of object classes... ”. 

Interpreting these considerations in the COL framework this means that it 
is indeed adequate not to stick to COL-signature morphisms when we construct 
implementations. COL-signature morphisms are the appropriate tool to ensure 
encapsulation of COL-specifications (formally expressed by the satisfaction con- 
dition of an institution) which is indeed important when we construct large de- 
sign specifications in a modular way (i.e. by horizontal composition). But when 
we discuss refinements by relating abstract and concrete specifications (vertical 
composition) this is a totally different matter where it makes no sense to talk 
about encapsulation. 

Let us now consider two COL-specifications SPcozt, SPIcoL with signatures 
“coi; “Ico resp. together with a (standard) signature morphism o : X —> XI 
(where X and XI are the underlying standard signatures of Vco, and YIcor 
resp.). Moreover, let us consider the reduct functor _|, : Alg(XI) > Alg(Z) as 


a partial function _|, : AlgcoL(XIcoL) > Algcor (co), where: 


_-|o(AL) e Al|, if AI|s is a Xcor-algebra, 


--|o( AI) is undefined otherwise. 


Then we have the following fact (see Lemma 1 in [7]). 


Fact 2 |- : AlgcoL(XIcoL) > Algcor (co) is a COL-implementation con- 
structor if o(Sops) E STops and a(Stoose) E SILoose where Sons, Slovs are the 
observable sorts and Stoose, S{Loose are the loose sorts induced by coi, XIcoL 
respectively (see Section 2.2). 


1 By abuse of notation we use the same symbol _|, for the (total) reduct functor on 
Alg(XI) and for its induced partial reduct function on Alggo,(2Jcot). 


Proving Behavioral Refinements of COL-specifications 343 


Let us stress that vertical signature morphisms used for refinements in [15] 
satisfy the above conditions due to the fixed universe of visible data. Hence 
vertical signature morphisms in the sense of [15] are special cases of COL- 
implementation constructors. 


4 Proof Rules for Behavioral Refinements: Part I 


In the following we are interested in proof rules for proving behavioural refine- 
ment relations SPcory ~»"°°! SPIcoL. Obviously, the following basic proof rule 
follows directly from Definition 5: 


For any COL-specifications SPcoL = (XcoL, Ax), SPIcoL = (XIcoL, Axl), and 
COL-implementation constructor KcoL : Algco, (Ico) > Algcor(2'co): 


(B1) Mod|SPIcoL] E Dom(kcor), 
(B2) KkcoL(Mod[|SPIcoL]) F=ZScou Ax 


SPcor “cot SPlcon 





(Basic) 


Note that in (B2) kco (Mod[SPIcoL]) consists of X-algebras and that = scoL 
has been defined not only for COL-algebras but for arbitary X-algebras. Of 
course, the central question is how to prove (B1) and (B2)? For this purpose, 
we will follow a strategy which consists of two crucial steps. The idea of the first 
step, elaborated in this section, is to consider instead of the COL-specification 
SPIcoz a standard specification SPI (over the FOLEq institution) and instead of 
KCOL an implementation constructor « on standard algebras. This idea is related 
to the (behavioral) refinement notion in [20] and to the concept of an abstractor 
implementation in [19] where behavioral refinement is, by definition, a relation 
between the standard interpretation of the implementing specification and the 
behavioral interpretation of the specification to be implemented. The idea of the 
second step, elaborated in Section 5, is to reduce the proof of consequences w.r.t. 
the COL-satisfaction relation = x¢o, to proofs w.r.t. the standard satisfaction 
relation of first-order logic with equality. 








Let us start by considering COL-implementation constructors which are in- 
duced by standard implementation constructors. Given two signatures X and 
XI a standard implementation constructor from XI to X is a function «~ : 
Alg(XI) — Alg(X) which is iso-preserving. For simplicity, let us assume that « 
is total. Since any COL-algebra is also a (standard) algebra it is obvious that 
any implementation constructor « : Alg(XI) — Alg(X) gives rise to a (partial) 
function Kcox : Algco, (Ico) > Algcor(2'coL) where: 


Kooi (Al) & «(AI) if (AD) is a Scoz-algebra, 
KcoL(Al) is undefined otherwise. 
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If this partial function is COL-iso-preserving then kcor is a COL-implementation 
constructor induced by K.? For instance, Fact 2 provides a simple criterion when 
reduct functors along standard signature morphisms induce COL-implementation 
constructors. 

To state our second proof rule we consider for any COL-specification SPIcoL 
its associated standard specification SPI obtained by forgetting the constructor 
and observer operations declared in SPIcoy. Then we have for any specifications 
SPcoL = (XcoL, Ax), SPIcoL = (XIcoL, Axl), SPI = (XI, Axl), where XI is 
the underlying standard signature of XIcoL, and for any « : Alg( XI) > Alg( X) 
and COL-implementation constructor kco : AlgcoL(XIcoL) > AlgcoL( coL) 
induced by k: 





(F1) «(Mod|SPI]) € AlgcoL(¥cor), 

(F2) «(Mod[SPT]) Esco, Ax 
(Forgetcor) 

(B1) Mod[SPIcoL] € Dom(rKkcoL), 

(B2) kcor(Mod|SPIcoL]) Esco, AX 





Lemma 1. The proof rule (Forgetaoz,) is correct. 


Proof. Assume (F1) and (F2). To prove (B1) and (B2) let AI € Mod[SPIcot]. 
Then AI Fyigo, AxI and hence, by Fact 1, AI/X sicoor ar H Axl. Thus 
AI/®5Igo,,41 E Mod|SPI]. By assumption (F1), K(AI/X 5101, Ar) is a Xcor- 
algebra. Hence, since Kcoy is induced by Kk, kcoL( AI /~5SIcoL,Ar) is defined. 
Since AI =sigo, AI /~5SIcoL, Ar and Kco is a COL-implementation construc- 
tor, kcoL (AI) is defined as well. Hence, Mod [SPIcoL] E Dom(kcoL), i.e. (B1) 
holds. 

Moreover, since AI/®sigo,,Ar E Mod[SPI], the assumption (F2) implies 
that K(AI/X Sco, AI) ÆSco Ax. Then, since kcoL is induced by «K, we have 
kcoLl AI /X5SicoL, AI) Eco, AX. Since kcoL is a COL-implementation con- 
structor and AI =yigo, AI/%5SIcoL ar, we conclude that kcoL( AI) =scoL 
KcoL(AI/*s ico, ar). But then kcoL(AI) Exsgo, Ax holds as well. Hence, 
kcoL(Mod|[SPIcot]) F—-XScou Ax, ie. (B2) holds. 
































The proof rule (Forgetgoy,) is also complete if Mod[SPI] is closed under 
behavioral quotients, i.e. if any YJ-algebra AI € Mod[SP]] is a XIcoLr-algebra 
such that AI /s1go,,A1 E Mod[SPI]. (The proof relies on the fact that, under 
this assumption, Mod[SPI] C Mod[SPIcor].) 

According to the given proof rules (Basic) and (Forgetcoy,), the remaining 
task to prove behavioral refinements is to prove (F1) and (F2). A possible ap- 
proach to discharge (F1) will be explained with the example in Section 6. A 
general technique to discharge (F2) is studied in the next section. 


2 In particular this means that « is compatible with observational equivalences between 
COL-algebras, a property which is related to the notion of stability introduced by 
Schoett [20]. 
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5 Proof Rules for Behavioral Refinements: Part II 


In this section we focus on how to handle the proof obligation (F2) arising from 
the proof rule (Forgetgoy,). Basically we have to show that a set of formulas is 
behaviorally satisfied by some class of arbitrary algebras. The difficulty here is 
that these algebras are not COL-algebras w.r.t. the same COL-signature as the 
one used for the behavioral satisfaction considered, hence we cannot reuse the 
ideas and proof techniques detailed in [4]. However, we can rely on another idea, 
similar to the one introduced in [5], where the proof of behavioral consequences 
is replaced by the proof of standard consequences using a so-called “lifting en- 
coding”. The main difference to [5] is, first, that in COL we have distinguished 
sets of observer and constructor operations which lead to much less observable 
contexts and constructor terms than in the case of partial observational equali- 
ties considered in [5]. Hence, the ideas of [5], which were mainly of theoretical 
interest, now become practically relevant. Secondly, in contrast to [5], we fol- 
low a coinductive style for the encoding of observable contexts which is more 
appropriate for proving behavioral theorems. 

Our lifting encoding relies on a syntactic counterpart of both the constructor 
terms and the observable contexts. Therefore we need a few preliminary defini- 
tions. Remember that given a COL-signature XcoL, for each state sort s and 
observable sort s’, C(XcoL)s—s denotes the set of the observable Ycoi-contexts 
with application sort s and result sort s’. 


Definition 6 (Lifted signature AL(XcoL) associated to a COL-signature 
Sco.) Let Voor = (X, OPcons, OPops) be a COL-signature. The induced lifted 
signature AL(XcoL) is defined as follows: 


AL Soon) = 5 U A(OPcons) U A(OP cons) U A(OPops) WALOP os) 


where A(OPcons) is the signature fragment containing: 
— for each constrained sort s E SCons, a new sort cfs]; 
— for each constructor cons : s1, ...,Sn + s E OPcons, 
a new operation cons* : 37,...,3n — cfs] 


where here and in the following, for any sort r € S, T I if r © SLoose 


and T = clr] ifr € Scons; 
where A(OPcons) is the signature fragment containing: 
— for each constrained sort s © Scons; 
a new (overloaded) operation inj : cls] > s; 
— for each constrained sort s © Scons; 
a new (overloaded) unary predicate G : s on the sort s; 
where A(OPobps) is the signature fragment containing: 
— for each state sort s E Sgtate and observable sort s’ E€ Sobs, if 
C(XcoL)s—s is not empty, a new sort Cont[s—s’];3 


3 Otherwise, i.e. if C(XcoL)s—s is empty, no new sort is added to reduce the syntactic 
complexity of the encoding. 
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— for each direct observer (obs,i) E€ OPops with obs : 81,...,8i,-.-,8n > 
s', a new operation obs} : 51,..., 311, 141,- --, Sn — Cont|s;—8'];4 
— for each indirect observer (obs,i) E€ OPobs with obs : $1,...,8i,.-+;8n > 


s, and for all observable sorts s! E€ Sons such that C(Xcoi)sss' is not 
empty,” new (overloaded) operations 
obs; : Cont|s—s'], 51, ..., 51-1, S41; ---, 5n > Cont[s;—8"]; 
and where A(OPops) is the signature fragment containing: 

— for each new sort Cont[s—s’], a new (overloaded) operation 
apply : Cont|s—s'|, s > 8’; 

— for each state sort s € Sgtate, a new (overloaded) binary predicate 
~ 18,8. 


Definition 7 (Lifting axioms Ax(XcoL) associated to a COL-signature 
Sco.) Let Ycor = (X, OPcons,; OPobs) be a COL-signature. The lifting axioms 
Ax(XcoL) associated to the lifted signature AL(XcoL) introduced in Definition 6 
are defined as follows: 


Ax(Zcor) & SGC(A(OP cons)) U AX scor (ing) U Ax seo, (G) U 
FSGC(A(OPobs)) U AX Seo, (apply) U AX Scot (~) 


where SGC(A(OPcons)) is the sort-generation constraint induced by the new 
sorts c|s] and by the new operations cons*; 
where Axys,o,(inj) states that the operations inj are injective and homomor- 
phic w.r.t. OP cons, ie. AXEgo, (inj) is the union of: 
— for each constrained sort s E€ Scons, the conditional equation: 
Va, y:ce[s]. inj(x) = inj(y) > xt = y; 
— for each constrained sort s © Scons and constructor cons E€ OPcons with 
CONS : S1,..., Sn > S, the implicitly universally quantified equation: 
inj (cons*(a1,...,%n)) = cons(Gar1,..., Czn) 
where here and in the following, Cx; = x; if the sort of x; is in SLoose, 
and Cx; = inj(a;) otherwise (i.e., if the sort of x; is of the form c[s;] 
with s; € Scons); 
where AxsooL(G) is the set of sentences: 
— for each constrained sort s E Scons: Vz:s. G(s) = Jy:c|s]. x = inj (y); 
where FSGC(A(OPobs)) is the free sort-generation constraint induced by the 
signature fragment A(OPobps), i.e., by the new sorts Cont|s—s'| and the new 
operations obs; ; 
where Axys,o, (apply) is the set of equations: 





— for each direct observer (obs,i) E OPops with obs : 81,...,8:,-.-,8n > 
s’, the equation: 
VOLS) 825 LE Spay WS gt i phone Vaisi 
apply(obs;(@1,...,%i—1, Zi+1;.--;, En), Vi) = 
obs (C21, eae KE 95 Bi, C@i41, siei Can); 


“The existence of the direct observer (obs,i) entails the non-emptiness of 
C(Xcon)s,;—s’, hence the existence of the new sort Cont[s;—s’. 
5 Hence, the new sort Cont[s—s’] exists, and so does the new sort Cont[s:—s’]. 
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— for each indirect observer (obs, i) E OPobs with obs : 81,...,8i;---,;8n > 
s, and for all observable sorts s’ € Sons such that the new sort Cont|s—s’] 
exists, the equations: 
Ve: Cont|s—s'], 21:37, ..., Li—1 311, Vig. Sigs. En n- VZi:Si. 
apply(obsž (c, £1,...,Zi—1, Litl,- Tn), Li) = 

apply (c, obs(Cz1,...,C£i—1, Zi, CXi41,---,¢4n)); 
where Axsoo(~) is the set of sentences: 
— for each state sort s E Sgtate: 


Yr, y:s. \ Ve: Cont|s—s']. apply(c, x) = apply(c,y) | & £~ y. 


Cont|[s— s’] 


The main idea underlying the above definitions is that, to any /-algebra 
A (where X is the standard signature underlying the COL-signature XYcoL), 
corresponds a unique (up to isomorphism) “lifted” AL(’cor)-algebra AL(A) 
which extends A (i.e., AC(A)|> = A) and satisfies the lifting axioms Ax(2coi).” 
Moreover, this lifted algebra AL(A) is defined in a way which ensures a one to 
one correspondence between: 


— values in the X’cop-generated part of the X-algebra A and values in the 
carriers of the new sorts c[s] in AL(A); 

— observable contexts, together with appropriate valuations of their variables 
(in the XcoL-generated part of A), and values in the (carriers of the) syn- 
tactic counterparts Cont[s—s’| in AL(A). Hence the new sorts Cont|s—s’] 
reflect the observable contexts in C(XcoL)s—s and they are generated by the 
constructors obs;. Note that our definition of the constructors of the new 
sorts Cont[s—s’| follows the coinductive definition of observable contexts 
given in Definition 1. 


We still need a further definition to state the main result of this section. 


Definition 8 (Lifted formula L(y)). Let cor = (X, OPcons, OPobs) be a 
COL-signature and y be an arbitrary X-formula. The lifted formula L(y) is the 
AL(Xcox)-formula defined by:® 


L(y) = VAN Gly) | = 


y:s€FreeVar(y) and s€Scons 


where FreeVar(y) denotes the free variables of p, if any, and where p* is defined 
by induction on the structure of p as follows: 


6 This sentence is finite, since for any state sort s € Sstate, there is only a finite number 
of sorts Cont[s—s’], where s’ € Sops is an observable sort. 

7 In other words, the lifting axioms Ax(Xcox) induce a strongly persistent free functor 
from X-algebras to AL('cox)-algebras. 

8 Note however that the only extra (not in X) symbols used in £(y) are the predicates 
G and ~. Moreover, note the similarity with [5, Def. 4.1-(iv)]. 
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1. If p is an equation l =r between two terms of sort s: 
if s E Sops then y* ge l =r, otherwise s E SState and y* Jer lar; 
2. mp)" E e), (p np)" = gt Ags, (Vp = viv oh; 
3. Ifs © Spoose then (Va:s. y)* a Va:s. yp", 
otherwise s E Scons and (Va:s. p)* L yas. (G(x) > y*]. 
Obviously L(y) coincides with y* if p is a closed X-formula. 
Theorem 3. Let Ncot = (X, OPcons, OPobs) be a COL-signature. For any 
class C C Alg(S’) of X-algebras and any X-formula vy, we have: 
C Esco Y if and only if C + (AL(X cor), Ax(Xcor)) = L(¢)- 


Proof. For lack of space we only detail here the main steps of the proof. 








Step 1: In a first step we introduce a semantic lifting of X-algebras as follows. 
Let £(2) be the signature X enriched by the predicates G and ~ (as they are 
introduced in Definition 6). Remember that L(y) is indeed a £(2’)-formula, 
as pointed out in Definition 8. Now the semantic lifting L(A) of a Y-algebra 
A is defined as being the unique £(5’)-algebra extension of A defined by: 

1. L(A)|s & A; 
2. For any constrained sort s E€ Scons, and a € L(A), = As, GLA) (a) if 
and only if a E€ Gensgo,(A)s; 
3. For any state sort s E€ Sstate, and a,b E€ L(A), = As, a ~£) b if and 
only if a X ScoL,A,s b. 
Now we have: 


AF vxycoor V if and only if L(A) = L(y). 








The proof of this fact is similar to the proof of Theorem 4.2 in [5]. 
Step 2: In a second step we prove that, for any X-algebra A: 


AL(A)lcŒ) = L(A) 


This indeed results directly from Definitions 6 and 7 (see the comments 
after the later definition), and from the definitions of Gens,,,(A) and of 
X IcoL, A: 

Step 3: From the above we conclude that: 
A Esco, v if and only if, according to Step 1, 
L(A) E= L(y) if and only if, according to Step 2, 
AL(A)|c(s) E L(y) if and only if, according to the satisfaction condition 
in CFOLEq, AL(A) H L(y). This is enough to conclude the proof of the 
theorem, since { AL(A) | AE C} = C + (AL(X cor), Ax(cor)). 


As a direct consequence of Theorem 3 we obtain the following proof rule. 
For any COL-specification SPco, = (cot, Ax), CFOLEq-specification SPI = 
(XI, AxI) and for any « : Alg( XI) > Alg( X): 





s 























(L) «(Mod[SPI]) + (AL(Zcor), Ax(Xcor)) H L(Ax) 





(Lifting) 
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6 Example: Implementation of Sets by Non-redundant 
Lists 


In this section we illustrate the use of our proof rules and proof techniques on a 
small but non-trivial example. 


6.1 The Behavioral Refinement Relation 


The following specification SET-COL specifies properties of sets over a loose do- 
main of arbitary elements. As constructors for sets we use the operations empty 
and add and as an observer for sets we use the membership test isin.® 
spec SET-COL = 
sorts bool, elem, set 
ops true, false : bool; 
empty : set; 
add : elem x set — set; 
remove : elem x set — set; 
isin : elem x set — bool; 
constructors empty, add 
observer (isin, 2) 
axioms 
Yz, y : elem; s: set 
%% standard axioms for booleans, plus 
isin(x, empty) = false 
isin(x, add(x, s)) = true 
x Æ y => isin(x, add(y, s)) = isin(a, s) 
isin(x, remove(z,s)) = false 
x Æ y => isin(x, remove(y, s)) = isin(a, s) 
add(x, add(x,s)) = add(z, s) 
add(x, add(y,s)) = add(y, add(x, s)) 


end 


As a refinement for sets we consider a classical implementation of sets by 
non-redundant lists where the set operation add is implemented in a such a way 
that it inserts an element x into a list only if x does not yet occur in the list and 
the set operation remove just removes the first occurrence of an element. 
spec LIST-COL = 

sorts bool, elem, list 
ops true, false : bool; 
empty : list; 
cons : elem x list — list; 
head : list — elem; 
tail : list — list; 
isin : elem x list — bool; 


° All our examples are expressed using a syntactic sugar similar to the one of Cast [8]. 
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add : elem x set — set; 
remove : elem x set — set; 
constructors empty, cons 
observers head, tail 
axioms 
Yz, y : elem; |: list 
%% standard axioms for booleans, plus 
head(cons(a,1)) =x 
tail(cons(z,1)) =1 
isin(x, empty) = false 
isin(x, cons(x,l)) = true 
x Æ y => isin(z, cons(y,l)) = isin(z, l) 
isin(a,1) = true > add(x,1) = l 
isin(x, l) = false = add(x, l) = cons(z, l) 
remove(x, empty) = empty 
remove(«z, cons(z,1)) = 1 
x Æ y => remove(z, cons(y,1)) = cons(y, remove(z, l)) 


end 


To state the refinement relation we still need an appropriate COL-implemen- 
tation constructor. Since LIST-COL provides already all set operations the simple 
idea is to forget the list operations cons, head and tail and to perform an appro- 
priate renaming to match the sorts set and list. For this purpose we consider the 
(standard) signature morphism OgprasList : Yser — List Where S'gpr denotes 
the underlying (standard) signature of Sig/SET-coL],!° similarly, Xtısr denotes 


the underlying (standard) signature of Sig[/LIST-COL] and oggrasList(set) = list, 


OSerasLisr( T) L a otherwise. 

Since bool and elem are the observable sorts of both SET-COL and LIST-COL, 
Fact 2 implies that the reduct functor —loserasuer : Alg(Xser) > Alg(Xrisr) on 
standard algebras induces a COL-implementation constructor: 


—loserasuse * Alcor (Sig[LIST-C0L]) > Algcor (Sig[SET-coL)). 


Then, we claim that LIST-COL is indeed a behavioral refinement of SET-COL 
wrt. ie. SET-COL ~9—l¢serastisr. LIST-COL. 


| OSerasList ? 


6.2 Proof of the Refinement 


For the proof of the above refinement relation the combination of the rules 
(Basic) and (Forgetgo;,) provided in Section 4 shows that it is enough to con- 
sider the standard specification LIST obtained from LIST-COL by omitting the 
declarations of the constructors and observers. Then we have the following two 
proof obligations: 


10 ie. ser consists of all sorts and operations of the COL-signature Sig[SET-COL] 


without any constructor or observer declaration. 
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(F1) Mod [List] laureates C AlgcoL (Sig [SET-COL]) 
(F2) Mod[|LisT] 


To prove (F1) one has to check that the reducts of all models of LIST satisfy 
the reachability and observability constraints induced by Sig[SET-COL]; see Sec- 
tion 2. To check the reachability constraint we consider the generated parts of 
sort set which are constructed by empty and add. Obviously, due to the imple- 
mentation of add, those parts represent lists without duplicates. Moreover, from 
the axioms of LIST it follows that the only non-constructor operation remove 
does not introduce duplicates, i.e. the constructor-generated parts are already 
subalgebras and therefore the reachability constraint is trivially satisfied. For the 
proof of the observability constraint one has to show that both non-observer op- 
erations add and insert are congruent, i.e. are compatible with the observational 
equality for sets. For this purpose one can use the lifting encoding considered 
below and prove the congruence axioms for ~. Another strategy would be first 
to verify (F2) and then to conclude that both add and remove are congruent 
operations since the axioms of SET-COL provide sufficiently complete definitions 
for add and for remove.!! 

For the proof of (F2) we will apply the rule (Lifting) of the previous section 
which says that it is sufficient to prove that for all axioms y of SET-COL, 


| OSETasLIST 


Fsig[Set-cor] Y for all axioms p of SET-COL 





Mod [LIST] |ogeraerer + (AL(Sig/SET-COL]), Ax(Sig[SET-COL])) H L(y) 





or equivalently, since the (standard) satisfaction relation is compatible with 
reducts of (standard) algebras, 


List* + (AL(Sig[SET-COL]), Ax(Sig[SET-COL])) H L(y), 





where LIST™ is the same specification as List but with the sort list renamed 
into set. For this purpose, we first compute, according to Definition 6, the lifted 
signature: 


AL(Sig[SET-cot}) Ë Seger U A(OP cons) U A(OP cons) U A(OPops) U A(OP ops) 


where A(OPcons) consists of 
— the new sort c[set] 
— the new operations empty* : c[set]; add* : elem x c[set] — c[set]; 
where A(OPcons) consists of 
— the new operation inj : c|set] — set; 
— the new unary predicate G : set; 
where A(OPops) consists of 
— the new sort Cont|set— bool]; 
— the new operation isin* : elem — Cont|set— bool]; 
and where A(OPops) consists of 
— the new operation apply : Cont|set— bool] x set — bool; 
— the new binary predicate ~ : set x set. 


11 This idea follows a general result presented in [6] for observational logic and equa- 
tional specifications which still has to be extended to COL and conditional equations. 
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In the next step, we compute, according to Definition 7, the lifted axioms: 


Ax(Sig[SET-COL]) = SGC(A(OPcons)) U AX gsig[Ser-cor] (inj) U AXig[Snr-cor](G) U 
FSGC(A(OPobs)) U AX gig[Ser-cot] (apply) U AX sig[Ser-cot] (~) 


where SGC(A(OPcons)) is the sort-generation constraint 
generated type c/[set] ::= empty* | add*(elem; c[set]); 
where AXsig[Ser-cor] (inj) consists of 
— the conditional equation: 
Vs, s':c[set]. inj(s) = inj(s') > s=58'; 
— the implicitly universally quantified equations: 
inj(empty*) = empty 
inj (add* (x, s)) = add(z, inj(s)) 
where Axgjg[ser-cor](G) consists of 
— Vs:set. G(s) & ds’:c[set]. s = inj(s’); 
where FSGC(A(OPobs)) is the free sort-generation constraint 
free type Cont|set— bool] ::= isin* (elem); 
where AXsig|ser-cor] (apply) is the equation: 
— Va:elem, s:set. apply(isin* (x), s) = isin(a, s) 
where Axgig[ser-cor](~) is the sentence: 
— Vs, s':set. 
(Ve: Cont|set— bool]. apply(c, s) = apply(c, 8’) ) & s ~ s 








According to the above axioms, the unary predicate G characterizes those 
lists (of type set because of the performed renaming) which are built with empty 
and add. These lists are exactly the lists with no duplicates used for the represen- 
tation of sets. On the other hand, the above axioms provide also a specification of 
the binary predicate ~ which relates any two lists containing the same elements 
(independently of the order and the number of occurrences of these elements). 

Let us now compute the lifting £(y) of all axioms y of SET-COL which leads, 
according to Definition 8, to the following set of sentences: 


Yz, y : elem; s: set 
isin(x, niin = false 

G(s) = isin(x, add(x, s)) = true 
G(s) > (x 4 y => isin(az, add(y, s)) = isin(z, s)) 
G(s) = isin(x, remove(z,s)) = false (1) 
G(s) > (x Æ y => isin(x, remove(y, s)) = isin(z, s)) 
G(s) = add(«, add(x,s)) ~ add(z, s) 
G(s) = add(x, add(y,s)) ~ add(y, add(z, s)) (2) 


Of course, the remaining task is to show that the lifted axioms given above are 
consequences of List* + (AL(Sig[SET-COL]), Ax(Sig[SET-COL])). In most cases 
the proof is already a direct consequence of the axioms of the LIST specification 
without the need of the predicates G and ~. The situation is, however, different 
for the sentences (1) and (2). In the case of (1) the relativization w.r.t. G(s) 
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is indeed crucial, because isin(x, remove(x,s)) = false holds only for those list 
interpretations of s which have no duplicates, but these are exactly the lists 
characterized by the predicate symbol G. In the case of (2), the use of ~ instead 
of “=” is crucial as well, since two lists (also two non-redundant lists) are different 
if they contain the same elements but in a different order. In this case they are, 
however, observationally equal which is axiomatized by ~.!? 


7 Conclusion 


We have provided proof techniques to verify behavioral refinements of COL- 
specifications based on a reduction to first-order specifications and (standard) 
inductive reasoning. Hence, any inductive theorem prover can be used to prove 
behavioral refinements. Let us stress that we do not use coinduction to prove the 
behavioral validity of equations but we use an encoding of observational equali- 
ties and generated parts which works for arbitrary first-order formulas. Typical 
proofs of consequences of the encoding are then performed by induction on the 
(coinductive) structure of observable contexts. Next steps are the extension of 
our approach to take into account structured specifications and the study of fur- 
ther examples of implementation constructors like, e.g., specification extension. 


Acknowledgement. We are grateful to the anonymous referee of a previous 
version of this paper for valuable remarks. 
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Abstract. This paper explains the design of the second release of the 
Zen toolkit [5-7]. It presents a notion of reactive engine which simulates 
finite-state machines represented as shared aums [8]. We show that it 
yields a modular interpreter for finite state machines described as local 
transducers. For instance, in the manner of Berry and Sethi, we define a 
compiler of regular expressions into a scheduler for the reactive engine, 
chaining through aums labeled with phases — associated with the letters 
of the regular expression. This gives a modular composition scheme for 
general finite-state machines. 

Many variations of this basic idea may be put to use according to cir- 
constances. The simplest one is when aums are reduced to dictionaries, 
i.e. to (minimalized) acyclic deterministic automata recognizing finite 
languages. Then one may proceed to adding supplementary structure 
to the aum algebra, namely non-determinism, loops, and transduction. 
Such additional choice points require fitting some additional control to 
the reactive engine. Further parameters are required for some functional- 
ities. For instance, the local word access stack is handy as an argument to 
the output routine in the case of transducers. Internal virtual addresses 
demand the full local state access stack for their interpretation. 

A characteristic example is provided, it gives a complete analyser for 
compound substantives. It is an abstraction from a modular version of 
the Sanskrit segmenter presented in [9]. This improved segmenter uses 
a regular relation condition relating the phases of morphology genera- 
tion, and enforcing the correct geometry of morphemes. Thus we obtain 
compound nouns from iic*.(noun+iic.ifc), where iic and ifc are the re- 
spectively prefix and suffix substantival forms for compound formation. 


Dedicated to Joseph Goguen for his 65th birthday 


1 Regular Morphology 


We first consider the simplest framework for finite automata, where the state 
transition graph is a dictionary structure (lexical tree or trie). Such structures 
represent acyclic deterministic finite-state automata, with maximal sharing of 
initial paths. Every state is accessible from the initial state, and we may also 
assume that every state is on an accepting path. When we minimize the tree 
as a dag, we obtain the corresponding minimal deterministic automaton. Such 
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automata recognize finite languages. They are adequate for representing the 
lexicons of natural languages. 

In a framework of generative morphology, we want to model the construction 
of lexemes from smaller chunks called morphemes: radical stems, prefixes and 
suffixes. It is convenient to sort the morphemes into categories, and to enforce 
structural conditions on these categories, restricting the geometry of lexemes. 
For instance, we may describe this geometry by a regular expression over the 
alphabet of lexical categories. The language generated by the regular expression, 
substituting each category by its corresponding morpheme lexicon, is recognized 
by a modular reactive engine, which chains the morphemes dictionary lookup 
with transitions corresponding to the regular expression recognizer. We shall use 
for this setup variants of the compiling algorithm of Berry and Sethi [2]. 


1.1 Automaton Interface 


We use as algorithmic description language Pidgin ML, a core applicative subset 
of Objective Caml. Thus our algorithms may be read as rigorous higher-order 
inductive definitions, while being directly executable, in the spirit of literate 
programming. 

We first recall the basic structures of the Zen toolkit [5]. 

We use as basic alphabet the natural numbers provided by the hardware 
processor: 


module Word : sig 


type letter = int 
and word = list letter; 
end; 


d 


Thus the basic morphology operations will rely on list processing, and not on 
string processing (and certainly not on encoding formats such as Unicode-UTF8, 
which are meant for data exchange portability and should not be used for core 
computation). 

Here is the interface to our simplistic automata, reduced to deterministic 
transitions over a lexicon tree. Each state is labeled with a boolean (indicating 
whether or not it is an accepting state), and points to the list of its successor 
states, labeled with a letter. 


module Auto : sig 

type auto = | State of (bool x deter) ] 
and deter = list (Word.letter x auto); 
end; 


7 


We assume that at most one transition issued from a given state is labeled 
with a given letter. The datatype auto is here isomorphic to lexical trees, or 
dictionary, also called tries. We may also assume that dead alleys, i.e. states 
which do not have an accepting node as a substructure, are ruled out. Thus the 
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contraction of the tree as a dag, using for instance the corresponding instance of 
the sharing functor [5], yields the minimal automaton that recognizes the finite 
language stored in the dictionary. 


1.2 Dispatching 


We call phases the lexical categories, which constitute the alphabet of the reg- 
ular expression defining the morphological geometry. We compile this regular 
expression using the Berry-Sethi method, which linearizes the expression, and 
computes the local automaton associated to this linearization [2, 3]. 

We recall that local automata (also called Glushkov automata) are finite 
automata such that all transitions labeled with a given letter lead to the same 
state, characteristic of this letter. States may thus be named with letters, here 
phases. It is this locality condition which is a key to modularity. 

A local automaton is described by an initial phase, a set of terminal phases, 
here represented as a boolean function over phases, and a dispatch transition 
function, mapping each phase to a set of following phases, sequentialized here 
as a list. In the notations of [2], initial is called 1, dispatch is called follow, and 
terminal is implicit from the use of an endmarker symbol. In the terminology of 
Eilenberg [4], the set of non-empty words recognized by a local automaton is a 
local set over phases. 

In the Zen toolkit implementation, the Dispatch module is actually generated 
by meta-programming, i.e. it compiled from the regular expression, as we shall 
explain in section 4. 


1.3 Scheduling 


We are now ready to start the description of the reactive engine, as a functor 
taking a module Dispatch as parameter, and using its dispatch function as a 
local scheduler. Here is the corresponding specification of our React module. We 
assume the utility programming functions fold_right (list iterator), assoc, length, 
mem, etc. from the List standard library. 


module React 
(Dispatch: sig 
type phase = a; 


value transducer : phase — auto; 

value initial : phase; 

value terminal : phase — bool; 

value dispatch : phase — list phase; 

end) = struct 
type input = Word. word 
and backtrack = | Advance of phase and input | 
and resumption = list backtrack; 


A resumption value stores as a datum what is necessary to resume our reac- 
tive engine as a coroutine. 
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The scheduler gets its phase transitions from dispatch. It respects the order 
of dispatching. 


value schedule phase input cont = 
let add phase cont = | Advance phase input :: cont ] in 
fold_right add (dispatch phase) cont; 


1.4 React 


The reactive engine originates from the Sanskrit segmenter described in [9], 
generalized to the framework of mixed automata defined in [8]. 

Here we have a much simpler framework, since we do not have transducer 
output, but we get a modular interpreter, driven by the phase scheduler. 

In the following definition, phase is the current phase, input is the input 
tape represented as a word, back is the backtrack stack of type resumption, 
and state is the current state of type auto. We favor deterministic transitions 
within a phase to non-deterministic transitions to the next phase(s). Within a 
phase, we favor longer words over shorter ones. Phase transitions are effected 
in dispatch order. We have a mutual inductive definition between the reactive 
engine, reading forward, and the continuation manager, backtracking on failure. 


exception Finished; 


value rec react phase input back state = match state with 
| State (b,det) — 
let deter cont = match input with 
| [] — continue cont 
| [ letter :: rest ] —> 
try let state’ = assoc letter det in 
react phase rest cont state’ 
with [| Not-found — continue cont | 


] in 
if b (* accepting x) then 
if input=[] (* end of input x) then 


if terminal phase then back (x solution found x) 
else continue back 
else let cont = schedule phase input back in 
deter cont 
else deter back 


| 


and continue = fun 
| [|] — raise Finished 
| [ resume :: back | — match resume with 


| Advance phase input > 
react phase input back (transducer phase) 


l; 
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1.5 Usage 


The initialization of the reactive engine consists in setting the backtrack stack 
to the single initial state given by Dispatch.initial, input being initialized to the 
full sentence: 


value init_react sentence = [ Advance initial sentence ]; 


We may now recognize a string as belonging to the rational language de- 
scribed by the regular expression by calling the reactive continuation manager 
with this initial resumption: 


value reactl sentence = continue (init_react sentence ); 


If the sentence belongs to the language, react1 will return with a resumption 
value, otherwise it will throw the exception Finished. The resumption value is 
not of use in this simple model, where the interpreter is used as a mere recognizer. 
In more elaborate versions below, react may be used as a coroutine in order to 
compute a stream of transductions. 

Note that classical formal languages theory abstracts a language as a set 
of words, or occasionally as a multiset (hiding structural idempotence) when 
multiplicities matter. Here we hide structural commutativity as well, obtaining 
streams of solutions, where computational details such as fairness, essential for 
completeness, may be revealed and discussed. 


1.6 Correctness, Completeness 


Let us be given a module Dispatch by its components phase (an ordered list 
of discrete phase values defining the alphabet), initial (the initial phase), and 
functions transducer : phase — auto, terminal : phase — bool and dispatch : 
phase — list phase. 

Let L(¢) be the language recognized by the automaton transducer(¢), for 
a given phase ¢. We assume that L(initial) is the singleton {e} where e is the 
empty word [|], that L(¢) does not contain € for any other phase ¢, and that for 
every phase ¢ the list dispatch(@) does not contain initial. These invariants will 
be enforced by the Berry-Sethi compiler presented in section 4. 

Let us say that a sequence ((¢1, w1),... (dn, Wn)) is a valid analysis of a given 
word w whenever, taking ¢9 = initial, we get w = wi: w2- ... Wn with (0 < 
i < n) wi E€ L(ġi), (0< i <n) bi41 E dispatch(¢;), and terminal(¢,) = True. 
For i > 0, we know from the assumptions above that L(¢;) does not contain the 
empty word, so there is a finite number of such analyses. 

We define a total ordering on analyses by the lexicographical ordering gen- 
erated by (¢,w) < (¢’, w’) iff either ¢ precedes ¢’ in the common dispatch list 
where ¢ and ¢’ belong, or else 6 = ¢ and w is a strict initial prefix of w. 

The correction and completeness of the react algorithm may be established by 
proving that it generates the set of valid analyses of an input word w in the sense 
that it implicitly builds a sequence of pairs of analyses of w and resumptions 
((a1, rı), (an, ry )) such that, taking ro = init_react w, for each i (0 < i < N) 
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the evaluation of continue r; terminates with value rj,,, and the evaluation 
of continue ry raises the exception Finished. Furthermore the list (a1, ...ay) 
contains all the valid analyses of w, listed increasingly with respect to the above 
ordering. 

We shall not give a formal proof of this rather fastidious property, which can 
be established by computational induction. 

We remark that this argument makes explicit the fact that, within a given 
phase, we search for longer partial solutions before shorter ones. This is a rather 
arbitrary heuristic, which is convenient for the segmenting application. 


2 Modular Aums 


So far our automata have been mere recognizers for finite sets of words, i.e. 
dictionaries. Chaining them through phases, we may for instance model simple 
segmentation problems, where a sentence is defined as a list of words separated 
by blanks or punctuation signs, and words are defined as compounds of mor- 
phemes, according to prefix, suffix, or other finite-state regimes. Such a segmenter 
may be composed with a tagger, when the word dictionaries are decorated with 
morphological derivation annotations, using the structure of revmaps [5], which 
allows efficient sharing of morphological regularities. 

We now allow more complex automata for the various phases. For instance, 
we may allow a notion of transition with virtual addresses, allowing both non- 
deterministic moves (including ¢-transitions), and cycles. 

Virtual addresses, as opposed to pointers and explicit cyclic structures, pro- 
vide a declarative mechanism respecting sharing. In the original presentation of 
aums [8], two varieties of virtual addresses are proposed: absolute addresses, in- 
dexing a state by its absolute access path in the forest of deterministic skeletons, 
and relative addresses, indexing a state in the current covering trie by the short- 
hest path in the tree, encoded as a differential word pairing a natural number 
(how many levels in the tree you should go up) with a word (indexing the target 
state down from the closest common ancestor). These differential words are used 
for instance in the revmap structure, to store the reverse morphology. 

In the next section, we shall ignore relative addresses, which necessitate a 
slightly more complex apparatus for their proper evaluation, since sharing makes 
ambiguous the inheritance relation, and thus access paths must be maintained in 
the automaton structure for proper interpretation. We shall present first simple 
absolute addresses. Furthermore, the role of the forest index will be played by 
the phase: to each phase corresponds a unique auto structure, covering all the 
states pertaining to this phase. 


2.1 Mixed Automata with Virtual Absolute Addresses 


A transition (w,v) recognizes word w on the input tape (the “guard” of the 
transition), and jumps to the state absolutely adressed by v in the next phase. 
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module Auto : sig 


type transition = (Word.word x Word. word) 


and choices = list transition; 

type auto = | State of (deter x choices) ] 
and deter = list (Word.letter x auto); 
end; 


oI 


We take as convention that the state State(d,c) is accepting iff c is not empty. 
We now define acceptance as the condition on external transitions (w,v) when 
the input is empty, the (next) phase is terminal, and the access parameter v 
verifies a final condition which we shall not precise further. Typically, v is final 
if it is empty or if it consists in a special end of sentence marker. 


2.2 Service Routine 


Our resumptions are now more complex, since we have non-deterministic choice 
points: 


type backtrack = 
| Choose of phase and input and auto and choices 
| Advance of phase and input and word 


| 


and resumption = list backtrack; 


exception Finished; 
Here are two service routines to manage guard management. 


exception Guard; 
value rec advance n w= if n = 0 then w else match w with 
| [] — raise Guard 
| [ - :: tl ] — advance (n-1) tl 
le 
Thus advance n [ai1; ... aw] = [@p; ... an], where p = N — n, whenever 
n < N; otherwise the exception Guard is raised. 


(x [access : phase > word —> auto |] x) 
value access phase = acc (transducer phase) 
where rec acc state = fun 
| [] — state 
| [ c :: rest | — match state with 
| State (deter,_) > 
acc (List.assoc c deter) rest 
ļ; 
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2.3 React for Aums 


We use a similar schedule function as previously, it now stores the v access path 
for the next phase transition. 


value schedule phase input v cont = 
let add phase cont = | Advance phase input v :: cont | 
in fold_right add (dispatch phase) cont; 


We are now ready to present the reactive engine. It consists in three si- 
multaneous inductions, the main one react managing the deterministic search, 
while stacking non-deterministic choice points, the second choose managing non- 
deterministic jumps, and the third continue backtracking in case of dead end. 
We favor deterministic transitions over non-deterministic ones. 


(x phase is the parsing phase, 
input is the input tape represented as a word, 
back is the backtrack stack of type resumption, 
state is the current state of type auto x) 
value rec react phase input back state = 
match state with 
| State (det ,choices) —> 
(x we explore the deterministic space first x) 
let cont = if choices=[] then back else 
| Choose phase input state choices :: back |] 
in match input with 
| [] — continue cont 
| [ letter :: rest ] > 
try let next_state = assoc letter det in 
react phase rest cont next _state 
with [|Not_found — continue cont] 


| 


and choose phase input back state = fun 
| [] — continue back 
| [ (w,v) :: others ] —> 
let cont = if others=[] then back else 
| Choose phase input state others :: back ] 
in try let tape = advance (length w) input in 
if tape = [|] (x input finished *) then 


if terminal phase && final v then cont 
else continue cont 
else continue (schedule phase tape v cont) 
with [Guard — continue cont] 
] 
and continue = fun 
| [] — raise Finished 
| [ resume :: back | — match resume with 
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| Choose phase input state choices —> 
choose phase input back state choices 
| Advance phase input word — 
try let next_state = access phase word 
in react phase input back next _state 
with [Not-found — continue back] 


l; 

Finally, here is the initialisation routine, building the initial resumption: 
value init_react input = | Advance initial input [] ]; 

As previously, we may recognize a sentence using: 


value reactl sentence = continue (init_react sentence ); 


2.4 Correctness, Completeness 


Similarly to the previous section, we may prove the correctness and completeness 
of the construction, provided the guard w of each non-deterministic transition 
is non-empty. We may refine this condition as follows. 


Definition: Guard condition. There is no cycle of transitions of an aum all 
of which have an empty guard: (e€, w1); (€, w2); ...(€, Wn). By cycle we mean that, 
for some access word wo in the current phase ġo leading in transducer(¢q) to 
state co, co has among its choices (e€, w1), ¢1 in dispatch(¢o) with w; leading in 
transducer(¢,) to state o1, etc, until on = oo. 

We claim that react terminates on an input word whenever the guard con- 
dition is verified. Note that this is a global condition on the family of aums, 
which requires the knowledge of the phase transition relation, but which may be 
checked in time linear in the cumulated size of the aum family. 


3 Modular Aum Transducers 


We now give the final refinement of our construction, with aums having both 
local and global virtual addresses. 


3.1 Transducers 
module Auto : sig 


type continuation = (Word.word x Word.word) 
and transition = 
| External of (Word.word x continuation) 
| Internal of (Word.word x Word. delta) 


ik 
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type auto = | State of (deter x choices) ] 
and deter = list (Word.letter x auto) 

and choices = list transition; 

end; 


An internal transition Internal(w,d) recognizes w on the input tape and 
jumps to the state relatively addressed by d within the same phase. This uses 
the notion of differential word [5] from module Word: 


type delta = (int x word); (x differential words x) 


A differential word is a notation permitting to retrieve a word w from another 
word w’ sharing a common prefix. It denotes the minimal path connecting the 
words in a trie, as a sequence of ups and downs: if 6 = (n,u) we go up n times 
and then down along word u. In order to interpret the n part, we need to keep the 
stack of states leading locally to the current state. We keep along this stack the 
corresponding word path as well — this is useful as a parameter to the output 
computation. 

An external transition External(w,c) recognizes w on the input tape and 
executes the continuation c in a following phase. A continuation (u,v) returns 
words u as output parameter and v as access parameter in the next phase trans- 
ducer. 

As above we define acceptance as the condition on external transition when 
the input is empty, the phase is terminal, and the access parameter v verifies a 
final condition which we shall not precise further. 


3.2 Modular Transducers 


We now produce output, as words labeled by their phase. 


type input = Word. word 
and output = list (phase x Word. word); 


The access stack has a letter component and a state component. The state 
component is necessary to interpret the part of the internal virtual address which 
concerns going up, whereas the letter component, i.e. the absolute name of the 
state in the current phase is useful for computing the transducer output. 


type stack = list (Word. letter x auto); 


type backtrack = 
| Choose of phase and input and output 
and auto and stack and choices 
| Advance of phase and input and output and Word. word 


| 


and resumption = list backtrack; 


Since the Advance resumption has now an output component and an access 
component (anticipating a prefix of the next phase component), we parameterize 
the scheduler accordingly: 
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value schedule phase input output access cont = 
let add phase cont = 
| Advance phase input output access :: cont ] 
in fold_right add (dispatch phase) cont; 


The service routine access manages the access stack, the functions pop and 
push are used to interpret internal jumps. 


(x access : phase — word — ( auto x stack ) x) 
value access phase = acc (transducer phase) [] 
where rec acc state stack = fun 
| [] — (state , stack) 
| [ c :: rest ] — match state with 
| State (deter,_) > 
acc (assoc c deter) [| (c,state) :: stack ] rest 


] 
ie 


value rec pop n state stack = 
if n=0 then (state ,stack) 
else match stack with 
| [] — raise (Failure ”Wrong Internal jump”) 
| (_,st) :: rest ] — pop (n—1) st rest 
and push w state stack = match w with 
| [] — (state ,stack) 


| c :: rest | — match state with 
State (deter ,_) > 
push rest (assoc c deter) [| (c,state) :: stack | 





iE 


value jump (n,w) state stack = 
let (state0 ,stack0) = pop n state stack 
in push w state0 stack0; 


We provide the access stack as an output parameter via an extracting routine: 


value extract stack (_,(u,_)) = 
fold_left unstack u stack 
where unstack acc (c,-) = [c :: acc ]; 


3.3 Modular Reacting Transducers 


We have a similar structure of three mutually recursive functions, but now choose 
has two cases, for the two transition constructors. 
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value rec react phase input output back stack state = 
match state with 
[ State (det,choices) > 
let cont = if choices=[{] then back else 
[Choose phase input output state stack choices :: back] 
in match input with 
[ [] — continue cont 


| [ letter :: rest ] —> 
try let state’ = assoc letter det 
and stack’ = [ (letter ,state) :: stack ] in 
react phase rest output cont stack’ state’ 
with | Not_found — continue cont | 
and choose phase input output back state stack = fun 
[ [] — continue back 
| [ External((w,(u,v)) as rule) :: others |] > 
let cont = if others=[] then back else 
[Choose phase input output state stack others :: back] 
in try let tape = advance (length w) input 
and out = [(phase,extract stack rule) :: output] 
in if tape = |] (* input finished x) then 


if terminal phase && final v then (out,cont) 
else continue cont 
else continue (schedule phase tape out v cont) 
with | Guard —> continue cont | 
| [ Internal(w,delta) :: others ] — 
let cont = if others=[] then back else 
[Choose phase input output state stack others :: back] 
in try let tape = advance (length w) input 
and (state ’,stack’) = jump delta state stack 


in react phase tape output cont stack’ state’ 


with | Guard — continue cont | 
and continue = fun 
[ [] — raise Finished 
| [ resume :: back | — match resume with 


| Choose phase input output state stack choices — 
choose phase input output back state stack choices 
| Advance phase input output word — 
try let (state’,stack’) = access phase word 
in react phase input output back stack’ state 
with | Not_found — continue back ] 


3 
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3.4 Correctness, Completeness 


The definitions of trace and analysis may be extended to the case of transducers, 
and the correctness and completeness of our engine may be formally proved in 
the sense that all transductions of the input word are properly generated, for a 
notion of left-to-right transduction. We omit here the full formal development. 

In the case of non overlapping junction transductions, as defined in [9], the 
construction simplifies, since Internal transitions are not needed. The proofs of 
termination, correctness and completeness of the reactive engine are carried out 
in full in [9], for the simple case of one phase junction relations verifying a non- 
overlapping criterion. This criterion allows parallel computation of the relation 
along phases, without the need to cascade the transductions. Furthermore, such 
relations are invertible, and the reactive engine may thus be used to invert eu- 
phony and return segmentation solutions, even when the euphony relation is not 
length-preserving. 

Other variations may be considered, since the presence or absence of output 
transitions is orthogonal to the structure of virtual addresses. We have considered 
virtual addresses of two kinds, internal and external. We may also imagine other 
encodings of jumps, potentially relevant for specific applications. For instance, 
specific encodings, relying on the fact that the underlying alphabet is boolean, 
may be used to represent boolean circuits, in the manner of BDD structures. 

The general problem of compiling an arbitrary finite-state machine descrip- 
tion into some variety of our aum structures is not addressed in the current pa- 
per. This problem has many degrees of freedom, since there is a choice between 
mapping state transitions into the deterministic skeleton, on one hand, and the 
non-deterministic choices sequences, on the other; in the latter case, there is a 
further choice between External and Internal jumps. Finally, the partition into 
phases may be more or less coarse, and extra encoding letters, disjoint from the 
input alphabet, may be used to attach orphan states. We should not expect one 
uniform best solution to this problem anyway, and compiling strategies may well 
depend on the application domain. 

Remark. 

In [9], section 8.1, the recursive call from choose calls react with occ parameter 
v, instead of rev v as effected above for next_stack. This is a local optimisation 
for the case of sandhi, where the junction rules are such that the length of 
component v is at most 1. 


4 Dispatch Synthesis from Regular Expressions 


We now explain how to synthesize the dispatch function from a regular expression 
representation of the phase language, using the Berry-Sethi algorithm [2]. The 
basic idea is that we compose a number of finite automata/transducers, each 
named with a phase. Phases are the letters of an alphabet, and we define the 
admissible joint behaviour of our automata as a rational language over the phase 
alphabet, specified by a regular expression. 
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4.1 Regular Expressions and Their Linearization 


Here is the type of regular expressions. The type parameter a is used to abstract 
from the symbol representation. 


type regexp a= 

One 

Symb of a 

Union of regexp a and regexp a 
Conc of regexp a and regexp a 
Star of regexp a 

Epsilon of regexp a 

Plus of regexp a 


, 





We use a specific constructor Plus rather than defining Rt as the macro 
R- R*, because of the blow-up due to its non-linearity. 
We mark symbols with an integer to linearize the regular expression. 


type marked a = (a x int); 


A symbol s is mapped to (s,0) if it occurs only one, and to (s,1), (s, 2), 
etc. otherwise. Marked symbols are used as states of the recognizing automaton. 
The type local represents local automata, in the sense of Eilenberg, as a 4-tuple 
defining its initial state, the other states, the transitions, and the terminal states: 


type local a= 
( marked a x list (marked a) 
x list (marked a x list (marked a)) 
x list (marked a) 


); 

We skip the details of the linearization function mark, which is straightfor- 
ward. The function mark takes as argument a regexp a, and returns a pair of type 
regexp(marked a) x list(marked a), consisting of the marked expression, and 
the list of marked symbols which will be used as states of the local automaton. 


4.2 The Berry-Sethi Compiler 


We basically follow the construction given in [2], with the addition of the Plus 
operation. We need an intermediate structure of discriminating regular expres- 
sions, which makes explicit whether the associated rational language contains 
the empty word € or not. 


type d-regexp a= 
[ DOne 
| DSymb of a 
| DUnion of bool and d_regexp a and d_regexp a 
| DConc of bool and d_regexp a and d_regexp a 
| DStar of d_regexp a 
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| DEpsilon of d-regexp a 
| DPlus of bool and d_regexp a 
]; 


We can tell in unit time this property with function delta, and translate in 
linear time a regexp in a discriminating regexp with function discr. 


value delta = fun 

| DOne — True 
| DSymb _ — False 
| DUnion b _ _ | DConc b _ _ > b 
| DStar - | DEpsilon - — True 
| DPlus b _ — b 
] 


> 


(x discr : regezp a —> d_regexp a *) 
value rec discr = fun 
One — DOne 


Symb s — DSymb s 
Union el e2 — 
let del = discr el and de2 = discr e2 in 
DUnion (delta del || delta de2) del de2 
Conc el e2 — 
let del = discr el and de2 = discr e2 in 
DConc (delta del && delta de2) del de2 
Star e — DStar (discr e) 
Epsilon e — DEpsilon (discr e) 
Plus e — 
let de = discr e in 
DPlus (delta de) de 





oI 


The core of the algorithm is the computation of sets first, follow and last. 


(x first : list a> d_regezrp a- list a x) 
value rec first l = fun 
DOne — 1 


DSymb d > [ d:: 1] 
DUnion - el e2 — first (first l e2) el 
DConc -` el e2 — 
if delta el then first (first 1 e2) el 
else first 1 el 
DStar e | DEpsilon e | DPlus _ e — first l e 





7 


(x follow : œa > regexpa-— list (a x list a) x) 
value follow initial exp = 
let rec fl exp 1 fol = 


370 Gérard Huet and Benoit Razet 


match exp with 


| DOne — fol 
| DSymb d > [ (d,1) fol | 
| DUnion _ el e2 => 

let fol2 = fl e2 1 fol in fl el 1 fol2 
| DCone _ el e2 => 

let fol2 = fl e2 1 fol in 

let 11 = if delta e2 then first 1 e2 

else first [] e2 in 
fl el 11 fol2 
DStar e | DPlus _ e > 
let l_res = first l e in 
f2 e l_res fol 





and f2 exp 1 fol 
match exp with 


(x (firs 











DOne — fol 
DSymb d > | (d,1) 
DUnion _ el e2 — 
let fol2 = f2 e2 1 fol 
DConc -` el e2 — 
let b1 = delta el 
and b2 = delta e2 in 
if bl (* l1 and l2 in 
then if b2 
then f2 
else f1 el (first 
else if b2 
then f2 el (first 
else f1 el (first 
| DStar e | DEpsilon 
] in 
let fol_sets = fl exp [] [] 
and initials = first [] exp 
[| (initial , initials ) fol_ 


Functions fl and f2 both compute the follow sets of Berry-Sethi but with 
precisely, a call (f1 exp 1 fol) is such 
that first elements of exp are not in 1, and the contrary assertion obtains for £2. 
Thus we never attempt to add elements already present in 1, which maintains a 


different assertions on their arguments; 


constant cost of adding an element in I. 


(x last 
value last initial e= 
let rec last_rec 1 = fun 


DEpsilon e — fl e 1 fol 


t [] exp) already in l x) 


fol ] 


in f2 el 1 fol2 


l x) 


el 1 (f2 e2 1 fol) 


[] 


l e2) (f1 e2 1 fol) 
[] e2) (f1 e2 1 fol) 


e2) (f2 e2 1 fol) 


e | DPlus _ e — f2 e 1 fol 


in 
sets 


j; 


: a —> regexp a > list a x) 
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DOne — 1 

DSymb d > [ d:: 1] 

DUnion -~ el e2 — 
last_rec (last_rec 1 e2) el 

DConc _ el e2 — 
if delta e2 then last_rec (last_rec 1 e2) el 
else last_rec 1 e2 

DStar e | DEpsilon e | DPlus _ e — last_rec l e 





in 
let 1 = last_rec [] e in 
if delta e then | initial :: 1 J] else l; 


Now we have all the ingredients to compile a regular expression: 


(x compile : marked a —> regerp a — local a *) 
value compile initial exp = 

let (exp_m, states) = mark exp in 

let exp_d = discr exp_m in 

let fol = follow initial exp _d 

and lasts = last initial exp_d in 


(initial, states, fol, lasts); 


4.3 Parametric Regular Expressions 


We now define systems of regular expressions over parametric alphabets whose 
symbols are associated to aums. Meta-variables allow sharing in such descrip- 
tions. We skip the details of the syntax, and present just an example of such a fi- 
nite machine description, actually a subproblem of Sanskrit morphology, namely 
noun phrases representing compound substantives. 


initial init epsilon_aum 
alphabet noun ; iic ; ifc end 


automaton Disp 
node SUBST = iic* . (moun | iic.ifc) 
end 


Here we specify that the initial phase is called init, that the user must 
provide a value epsilon_aum for the aum recognizing just the empty word, 
as well as aum values noun, iic and ifc for recognizing the corresponding 
languages. We are interested in the language iic* . (noun | iic.ifc).In the 
intended application, SUBST is the language of substantive forms, containing 
noun forms as well as compounds, formed with prefix iic forms which may be 
iterated, and suffix ifc forms. 

We skip the details of the parsing of such a description. In the current syn- 
tax, we allow systems of regular expressions, allowing sharing, and the compiler 
unfolds the system into a flattened expression. 
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We use the meta-programming facilities provided by the Camlp4 preproces- 
sor, which allows macro-generation of an Ocaml program at the level of abstract 
syntax. Skipping the details of this meta-programming, we obtain mechanically, 
for the above example, the following module text. 


module Automata (Auto : sig type auto = ’a; end) = 
struct 
type auto-_vect = 
{ epsilon_aum : Auto.auto; 
noun : Auto.auto; iic : Auto.auto; ifc : Auto.auto }; 
module Disp (Fsm : sig value autos : auto_vect; end) = 
struct 
type phase = 
Init | licl | Noun | Iic2 | Ife ]; 
value transducer = fun 


Init — Fsm.autos.epsilon_aum 
licl — Fsm.autos.iic 

Noun — Fsm.autos.noun 

Tic2 — Fsm.autos.iic 

Ifc — Fsm.autos.ifc 


value dispatch = 





fun 

Init — | licl; Noun; lic2 | 
Iicl — [ licl; Noun; lic2 ] 
Noun > |] 
Iic2 — [ Ifc ] 
Ife — [{] 

value initial = Init; 

value terminal phase = List.mem phase | Noun; Ifc ]; 


end; 
end; 


We now have all the components we wish to assemble, since the module 
instanciation (Automata Auto), for Auto one of the aum description modules 
given in the previous sections, creates a module Dispatch=(Disp Fsm) having 
the right functionality, with module Fsm holding the aum implementations. In 
this simple example these implementations are the various lexicons correspond- 
ing to the respective lexical categories. In the Sanskrit platform, these aums are 
decorated with non-deterministic transitions (using external addressing) corre- 
sponding to sandhi prediction. 


Remarks. 1. During the Berry-Sethi compiling process, the candidate regular 
expression is linearized when a phase occurs more than once. However, the cor- 
responding automata are shared via the transducer component, recovering the 
proper sharing. 
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2. Our Sanskrit platform! now uses this modular methodology, which enforces 
the right geometry for morphological chunks, taking care of preverb affixes, 
proper recognition of compound forms and periphrastic verbal constructions, 
and proper analysis of absolutive forms (with suffixes in -tva for roots and -ya 
for verbs admitting preverbs). 
3. As usual, we may augment our automata descriptions with weights reflecting 
(possibly conditional) probabilities in order to get stochastic automata whose 
behaviour reflects hidden Markov chains in the data. Note that the correctness 
criteria are invariant with the permutation of choices induced by priority selec- 
tion according to these weights. 


4.4 A Variant Using Antimirov’s Compiling Algorithm 


V. Antimirov proposed in [1] another algorithm for compiling regular expres- 
sions, using a notion of partial derivative. This algorithm produces automata 
that may be significantly smaller than the ones obtained by the Berry-Sethi 
algorithm. Such automata do not have the locality condition, and now the mod- 
ularity of the construction obtains by a more complex mapping, since the trans- 
ducer invocation does not simply depend on the states, but on the transitions. 
We shall not develop further this variant construction in this paper. 


5 Conclusion 


We have presented a methodology for constructing finite-state machines, such 
as finite automata and transducers, in a modular way. Regular expressions over 
an alphabet of phases express a composition of machines under a finite-state- 
controlled constraint. This corresponds to considering a regular expression not 
as the mere denotation of a rational language over the alphabet of its symbols 
seen as string generators, but rather as a rational polynomial over its symbols, 
abstracting themselves rational sets. The algebraic property of closure of ra- 
tional sets over substitution (mapping symbols to rational sets), together with 
the local automaton representation of finite-state machines, provide the natural 
foundation for the modular composition of finite-state machines. 

Our mechanism allows the controlled interaction of machines compiled as 
mixed automata (aums). This is useful for instance for shallow parsing in com- 
putational linguistics applications. For the Sanskrit platform built by the first 
author, this allows to build a tagger composing machines which invert phonol- 
ogy (sandhi analysis) and morphology, with separate machines for distinct lex- 
ical classes, constrained by the geometrical conditions defining admissible com- 
pounds, preverb management, and periphrastic constructions with auxiliary verbs. 

Our design exploits and justifies our functional programming methodology 
as follows: 


1 http://sanskrit.inria.fr/ 
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Applicative programming leads to robust well-structured programs, ame- 
nable to formal proofs and to journal publication, in the spirit of literate 
programming — all our programs are rigorously expressed as inductive def- 
initions over higher-order types. 

Functionality is essential to the concise expression of powerful control para- 
digms such as continuations, essential for the definition of coroutine inter- 
preters for non-deterministic search. 

Modularity of the programming language is the essence of the parametricity 
underlying algebraic closure operations, and thus is an essential abstraction 
paradigm. 

Powerful macro-generation mechanisms lead to an effective meta-program- 
ming methodology, tailoring general algorithms to the specific needs of ap- 
plications. 

Despite this very high-level view of software architecture, the resulting pro- 
grams are efficient enough for their integration in real size applications, as 
witnessed by their use in computational linguistic platforms [9]. 
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To Joseph Goguen on the occasion of his 65th birthday 


Abstract. This papers reviews the classical theory of deterministic automata and 
regular languages from a categorical perspective. The basis is formed by Rutten’s 
description of the Brzozowski automaton structure in a coalgebraic framework. 
We enlarge the framework to a so-called bialgebraic one, by including algebras 
together with suitable distributive laws connecting the algebraic and coalgebraic 
structure of regular expressions and languages. This culminates in a reformulated 
proof via finality of Kozen’s completeness result. It yields a complete axioma- 
tisation of observational equivalence (bisimilarity) on regular expressions. We 
suggest that this situation is paradigmatic for (theoretical) computer science as 
the study of “generated behaviour”. 


1 Introduction 


In the early seventies Joseph Goguen described automata within a categorical perspec- 
tive (see for instance [11[12]13}), together with colleagues Arbib and Manes [I]. This 
paper fits in that tradition, using a more modern, bialgebraic setting, where algebra 
meets coalgebra. A bialgebra is a combined algebra and coalgebra F(X) => X — 
G(X) on a common carrier (or state space) X, satisfying a certain compatibility re- 
quirement wrt. a distributive law connecting the two functors F, G. These bialgebras 
found application within the abstract, combined description of operational and denota- 
tional semantics started explicitly by Turi and Plotkin [35]34]—and more implicitly by 
Rutten and Turi [32]. This is now an active line of work [26205118]. 

Goguen has always shown an interest in methodological and philosophical issues 
surrounding computing. The work in this paper also lends itself to such reflections. It is 
often claimed that data processing is the subject of the discipline of computer science. 
We think it is more to the point to describe the subject of computer science as generated 
behaviour. This is the behaviour that can be observed on the outside, for instance via a 
screen or printer. It arises in interaction with the environment, as a result of the computer 
executing instructions. 

This behaviouristic approach allows us to understand the relation with natural sci- 
ences: biology is about “spontaneous” behaviour, and physics concentrates on lifeless 
natural phenomena, without autonomous behaviour. The generated behaviour that we 
claim to be the subject of computer science arises by a computer executing a program 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 375-404] 2006. 
© Springer-Verlag Berlin Heidelberg 2006 


376 Bart Jacobs 


according to strict operational rules. The behaviour is typically observed via the com- 
puter’s I/O. Abstractly, the program can be understood as an element in an inductively 
defined set P of terms. This set thus forms a suitable initial algebra F(P) — P, where 
the functor F captures the signature of the operations for forming programs. The oper- 
ational rules for the behaviour of programs are described by a coalgebra P — G(P), 
where the functor G captures the kind of behaviour that can be displayed—such as 
deterministic, or with exceptions. We see that in abstract form, generated computer be- 
haviour amounts to the repeated evaluation of an (inductively defined) coalgebra struc- 
ture on an algebra of terms. Hence the bialgebras that form the basic structures used in 
this paper are at the heart of computer science. 


One of the big challenges of computer science is to develop techniques for effec- 
tively establishing properties of generated behaviour. Often such properties are for- 
mulated positively as wanted, functional behaviour. But these properties may also be 
negative, like in computer security, where unwanted behaviour must be excluded. How- 
ever, an appropriate logical view about program properties within the combined alge- 
braic/coalgebraic setting has not been fully elaborated yet. 


A distributive law is a natural transformation FG = GF that describes (in the 
current setting) the proper interaction of term-formation and computational behaviour. 
The basic observation of [35/34], further elaborated [5], is that such natural transforma- 
tions correspond to specification formats for operational rules on (inductively defined) 
programs. A bialgebra is an algebra-coalgebra pair satisfying a compatibility require- 
ment wrt. a given distributive law. These bialgebras, as already claimed, form very 
fundamental structures in computing, because they combine algebraic structure with 
the associated computational behaviour. The compatibility requirement entails elemen- 
tary properties like: observational equivalence (i.e. bisimulation wrt. the coalgebra) is a 
congruence (wrt. the algebra). 


This paper concentrates on deterministic automata, regular expressions and lan- 
guages. They form the very basic structures in computer science (see for instance [28]]) 
which are studied early on in standard curricula in computing. The main contribution 
of this paper is the demonstration that these classic structures fit perfectly in the bialge- 
braic framework. In fact, they may be considered as a paradigmatic example. The paper 
does not contain new results on regular expressions / automata / languages as such, but 
on the way they can (or should) be organised. The proper mathematical language for 
this organisation is categorical. The reader is assumed to be familiar with basic notions 
like functor, natural transformation, (co)monad and adjunction, such as can be found in 
any introductory text on category theory. Our investigations take place in the category 
Sets of ordinary sets and functions. We are well aware that many results generalise to 
other categories, but we do not always strive for the highest level of generality. 


There is already a large body of algebraic work on regular expressions, automata 
and languages, for instance within the context of regular algebras [9]. The coalge- 
braic perspective on this topic was introduced by Rutten , who demonstrated 
its fruitfulness especially for proving equalities via coinduction (using bisimulations). 
Rutten’s work exploits the automaton structure on regular expressions introduced by 
Brzozowski [8]9]. Here we go a step further by developing the bialgebraic (combined 
algebraic-coalgebraic) perspective. This involves a number of new technical results: 
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— a general mechanism for obtaining distributive laws and bialgebras for determinis- 
tic automata in SectionB} 

— a description of the free algebra and Brzozowski coalgebra structure on regular 
expressions as a bialgebra wrt. a (categorical) GSOS law in Subsection|4.3} 

— anew proof of Kozen’s completeness result for regular expressions and 
languages in Section [5] by describing the coalgebra of regular expressions mod- 
ulo equations as a final object. This shows that Kozen’s axioms and rules give a 
complete axiomatisation of observational equivalence (bisimilarity) on regular ex- 
pressions. 


Throughout the paper we heavily rely on previous work, notably [35B T24]. 

We expect that the bialgebraic picture that is emerging constitutes a paradigm which 
also applies to many more computational models (as already suggested in [35]]). After 
all, regular expressions are extremely elementary, and capture only a very limited form 
of computation. Hence the bialgebraic paradigm is still in need of further instantiation, 
confirmation, and elaboration. 


2 Deterministic Automata as Coalgebras 


This section collects some standard facts about deterministic automata, described as 
coalgebras, in order to determine the setting and fix the notation. 

We use two arbitrary sets A and B, where the elements of A may be understood as 
letters of an alphabet, and the elements of B as outputs. A deterministic automaton 
with A as input and B as output set consists of two functions: 


6: X — X^ for transition e:X — B for output 


acting on a state space X. The transition function 6 maps a state x € X and an input 
letter a € A to a successor state x’ = d(x)(a) € X. In that case one may write 
x —+ 2’. The output function £ gives for a state x € X the associated observable 
output e(x) € B. 

The one-step transition function ô can be extended to a multiple-step transition func- 
tion 6*. The latter takes a state x € X and a sequence o € A* of inputs to a successor 
state obtained by consecutively executing the steps in o. 


xy- x^ definedas [ ayl = a (1) 


ò* (x) (a : o) = 0*(d(x)(a))(o) 


This extended transition function ĝ* gives rise to the multiple-step transition notation: 
a —2+* a stands for z’ = 6*(«)(c), and means that z’ is the (non-immediate) succes- 
sor state of x obtained by applying the inputs from the sequence o € A*, from left to 
right. 

The behaviour beh (x): A* — B ofa state x € X is then obtained as the function 
that maps a finite sequence o € A* of inputs to the observable output 


beh(z)(c) = e(0*(a,c)) € B (2) 
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The transition and output functions ô and £ of a deterministic automaton can be 
combined into a tuple (d,¢): X — X^ x B forming a coalgebra of the functor D = 
Dap given by U +> U4 x B. A coalgebra homomorphism from ((51, €1): Xı = 
x? x B) to ((52, €2): X2 = XA x B) consists of a function h: X1 — Xə between 
the underlying state spaces satisfying: 


D(F) o (61, €1) = (62, €2) © f, 


That is, f4 o 56; = 62 o f ande, = £2 o f. Or, more concretely, f(61(x)(a)) = 
ô2( f (x))(a) and e1 (x) = €2(f(x)), for all x € X anda € A. 

This describes morphisms in a category CoAlg(D). The following result, occur- 
ring for example in [2[29]16], is simple but often useful. It gives an explicit description 
of the final object in the category CoAlg(D). The proof is easy, and left to the reader. 


Proposition 1. The final coalgebra of the functor D = (—)4 x B for deterministic 
automata is given by the set of behaviour functions B® , with structure: 


D,E . 
BË PF) ga ia xB 


given by: 














D(y)(a) = ào € A*.p(a-0) and El) = ¢(0). 


As is well-known—after Lambek—the structure map of a final coalgebra is an iso- 
morphism. The carrier B4” of the final coalgebra collects all possible behaviours of 
deterministic automata. Two special cases are worth mentioning explicitly. 


Example I. Consider the above final coalgebra B Š (BA")* x B of the deter- 
ministic automata functor D = (—)4 x B. 


1. When A is a singleton set 1 = {0}, so that A* = N, the resulting functor D = 
(—) x B captures stream coalgebras X — X x B. Its final coalgebra is the set BN 
of infinite sequences (streams) of elements of B, with (tail, head) structure, 


BN —+BNx B givenby y (An E€ N. y(n +1), y(0)) 


2. When B = 2 = {0,1} describing final (or accepting) states of the automaton, the 
final coalgebra B^” is the set L(A) = P(A*) of languages over the alphabet A, 
with structure: 


£(A) => £(A)4 x2 givenby LH (a € A. D(L)(a), E(L)) 


where D(L)(a) is the so-called a-derivative, introduced by Brzozowski [8], and 
defined as: 

D(L)(a) = {0 € A* | a-a€ L}, 
and where E(L) = 1 4> () € L. 
Given an arbitrary automaton X — X^ x {0,1} of this type, the resulting be- 
haviour map beh: X — P(A*) thus describes the language beh(x) C A* ac- 
cepted by this automaton with x € X considered as initial state. 
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Both these final coalgebras BN and L(A) = P(A*) are studied extensively by 
Rutten, see [30|33|31]. One of the things that he emphasises is the use of bisimulation as 
a reasoning principle. Here we only sketch the main points, for deterministic automata. 


Definition 1. Consider two coalgebras (61, £1): X1 —> Xá x B and (62, £2): X2 > 
Xå x B. A bisimulation between them is a relation R C Xı x Xə on the underlying 
state spaces that satisifies for all x1 E€ X1, £2 € Xo, 


e1(x1) = €2(x2), and 


R(x1, x2) => 
R(01(#1)(a), 62(a2)(a)), foralla € A. 


We write yı = y2 and call yı, yz bisimilar if there is a bisimulation R with R(y1, y2). 


Bisimilarity expresses observational equality, that is, equality as far as one can ob- 
serve with the available (coalgebraic) operations. This explains the following result. 


Proposition 2. In the situation of the previous definition one has: yı = yə if and only 
if behi5, <1) (y1) = behs; £2) (y2). 


Proof. The implication (=) is easy, since if y1 & y2, say via a bisimulation R with 
R(y1, y2), then by induction, R(dt(y1)(c), 65 (y2)(a)), for each o € A*. This yields 
beh 5,61) (Y1) = e1(d7(y1)(o)) = €2(63(y2)(7)) = behys,,<,) (y2). For the reverse 
implication (<=) one uses that the relation { (21, £2) | behys, -,) (£1) = behys, e2) (v2) } 
is a bisimulation. This follows directly because the beh maps are homomorphisms. 














States are thus bisimilar if and only if they are equal when mapped to the final coal- 
gebras. Bisimulations provide a means to prove equations via “single-step” arguments. 
This makes coinductive reasoning similar to ordinary inductive approaches. See 
for an abstract account of the underlying dualities. 

Here is a very simple example—already using the regular algebra structure on lan- 
guages from Example[3]later on. For each letter a € A one has (1 + a)* = a* in L(A). 
This can be proven via the bisimulation R = {((1 + a)*,a*)} U {(0, @)}. 

At some stage we shall need the modal “eventually” operator &. Let (ô, €): X — 
X^ x B be an arbitrary coalgebra / automaton. For a predicate (or subset) P C X we 
define (P) C X as the set of all states that are reachable from P: 


O(P) = {o*(x)(o) | x € Pio € A*}. 


For a single state we write (x) for ({x}). Note that }(P) is a subcoalgebra / sub- 
automaton, because it is by construction closed under transitions. It may be described 
as the least invariant containing P, see [17]. The greatest invariant O(P) contained in 
P is the predicate {x | Vo € A*.0*(x)(o) € P}. 














3 Structured Output Sets and Distributive Laws 


In Section 9] the situation is studied where the output set B of a coalgebra X — 
X^ x B is a semiring. This generalises the situations studied in of final coalgebras 
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of real-valued streams (B = R) and languages (B = 2). It is shown that the sum and 
multiplication operations on B can be extended to the final coalgebras involved. 

Here we go a step further and assume an algebra structure 6: T(B) — B, fora 
monad T: Sets — Sets with unit 7: id => T and multiplication u: T? > T. Semirings 
then form a special case, see Subsection B.4| We show how this T-algebra structure 
on the output set B induces a distributive law TD => DT, and a strengthened form 
of coinduction using “J’-automata”, following the approach of [B6[5]. We shall give 
several illustrations involving different types of automata, for various concrete monads. 
These investigations go a bit beyond what is strictly needed for deterministic automata 
and regular languages. 

To start, we recall that for an arbitrary monad T and functor G acting on the same 
category, a distributive law à: TG = GT is a natural transformation that interacts ap- 
propriately with the monads unit 7 and multiplication jz. This means that the following 
two diagrams commute. 


TA À 
Gx SX- TGX T?GX do TGTX ——> GT?X 
Ax nox| Jeux) 
n | 
GTX TGX GTX 





Àx 


Example 2. The next two illustrations will be used frequently. They both involve the 
so-called strength map. 


1. For each functor T on the category Sets and for each set X there is a natural 
transformation st: T (—)* = (—)* T. It is usually called strength, and given as 
map T(Y*) — (TY)* by the formula: 


st(u)(x) = T(Ah € Y*. h(x)) (u). 


In case T happens to carry a monad structure, the strength map becomes a distribu- 
tive law. The above two diagrams then translate into: 





yx rya yx) E rrr -t ry) 
ae [sty | Jow” 
(TY)* TY) (TY)* 


st 


(The diligent reader may have noticed that strength is also natural in the functor, 
in the sense that for a natural transformation o: F => G one has siz y ò yx = 
(ay)* ° stġ y.) 

One useful point about strength for monads is that it allows pointwise construction 
of algebras on function spaces: if œ: T (Y) — Y is an Eilenberg-Moore algebra, 
then so is a* o st: T(Y*) => (TY)* — Y*. 


A Bialgebraic Review of Deterministic Automata, Regular Expressions and Languages 381 


2. We have formulated the notion of a distributive law for a monad and a functor. 
There are several “obvious” variations, for instance for a functor and a comonad. 
The next example again involves strength, and is related to the final coalgebra con- 
struction in Proposition] 

To start, let (M, -, e) be an arbitrary monoid. It gives rise to a functor (—)™ : Sets > 
Sets that turns out to be a comonad. The counit Ex: XM — X uses the monoids 
unit in Ex(y) = y(e), and the comultiplication Cx: X™ — (X™)™ works via 
the monoids multiplication in Cx (p) = Aa € M. Ab € M. y(a- b). 

We claim that for an arbitrary functor F, there is a distributive law F (—)” => 
(—)™ F over the comonad (—)™ . This law is again given by strength, and satisfies 
the following two “dual” properties. 


F(X™) St (pxyM F(x“) ——__ St xy 
| | 
F(X) F((X™)™) —> (F(XM))™ =e (aaa 


Why is all this relevant? Well, the final coalgebra structure described in Proposi- 
tion [IJ arises in this manner via the (free) monoid (A%*, -, ()) of strings with con- 
catenation: its observation map E: B4* — B is precisely the above counit Eg, 
and its transition map D: B > (ae arises from the comultiplication Cg, 
via restriction to singleton sequences: D(~)(a)(o) = C(y)((a))(c). The fact that 
strength forms a distributive law will be used in the proof of Proposition[4]below. 


As stated in the beginning of this section, we assume an Eilenberg-Moore algebra 
(:T(B) — B. By definition it satisfies the algebra laws 6 o 7 = id and 8 o T(G) = 
(3 o u. Then we can define a distributive law of the monad T over the automata functor 
D = (—)4 x B from the previous section, namely: 


. A 
TD A, DT with components T(X4 x B) BaN (Tx xB 


This law is obtained as composite: 


(T(m), T(T2)) st x 8 


T(X4 x B) T(X4) x TB (TX)4 x B 


The next result summarises what we have found so far. 


Proposition 3. Each Eilenberg-Moore algebra T(B) — B induces a distributive law 
A: TD > DT for the deterministic automata functor D = (—)4 x B. 














When we have an arbitrary monad T, functor G, and a distributive law à: TG => 
GT the relevant associated notion is that of a \-bialgebra: a pair of maps: 


py y - ! ay 
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where: 
— ais an Eilenberg-Moore algebra; 
— a and b are compatible via A, which means that the following diagram commutes. 
a b 
TX —~> X —~>GX 


ro] KO 
TGX GTX 





X 


A map of )-bialgebras, from (TX + X + GX) to (TY > Y 4 GY) is 
a map f: X — Y that is both a map of algebras and of coalgebras: f o a = c o T (f) 
anddo f = G(f) ob. 

The next two results are standard, see for e.g. [5{18], and are given without proof. 


Lemma 1. Assume a distributive law A: TG => GT, and let Ç: Z = + GZ be a final 
coalgebra. It carries an Eilenberg-Moore algebra obtained by finality in: 





G(a) 
GTZ => GZ 
dz‘ 
TGZ =s 
TOS 
TZ A >Z 








The resulting pair (TZ > Z $; GZ) is then a final -bialgebra. 











Lemma 2. In presence of a distributive law 4:TG = GT, there exists a bijective 
correspondence between GT -coalgebras e: X — GTX (also called equations) and 
d-bialgebras (T? X #5 TX +, GTX) with free algebra ux. 

Moreover, let (TY “. Y a GY) be a d-bialgebra. Then there is a bijective 
correspondence between “solutions of e” f: X — Y in: 





GTX CTU) GTY 
{G(a) 
e GY 
to 
X Y 














and à-bialgebra maps g: TX — Y—for the associated equations and -bialgebras. 
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Proposition 4. The assumed algebra 3:TB — B induces on the carrier BA” of the 
final D-coalgebra from Proposition{]| another T-algebra via a pointwise construction, 
namely, 


A* 
a (r (par) Ste ry SO bi Ba) 


so that E: B — B is a homomorphism of algebras. This B is the unique coalgebra 
homomorphism from Lemmaļi] 








DT(B4’) D(B) -~ DBA") 
Apart 
TD(B4") =| (D, E) 
T((D, E)) + 
T(B) Ss = BA 


using the distributive law from Proposition [3| Hence, this B together with the final 

coalgebra forms the final \-bialgebra: T(B® ) pas) DB"), 

Proof. According to Lemmafl]it suffices to prove that 8 is a homomorphism of coalge- 

bras. Here we use that strength is a distributive law as described in Example[2](2). 
D(8) © ee T((D, E)) 

an o st) o (sto T (m1), 8 o T(m2)) o T((D, E}) 

4 o st4 o sto T(D), Bo T(E)) 

4o Dost,BoEost) 


ewe 
ee 


II 


Il 


(64 
(84 
(64 
=(D 


SS 


o st 











= (D. ae o 8 ; 
The coinduction principle associated with a final \-bialgebra is called \-coinduc- 


tion in [5]. In the current situation, with the functor D for deterministic automata, the 
principle yields a strengthened form of coinduction for “T-automata”. 


Theorem 1. For each T-automaton (6,¢):X — D(TX) = (TX)^ x B—where B 
carries a T-algebra 3:TB — B—there is a unique map beh: X — B* making the 
following diagram commute. 





(TR)? xB = DTX ae DT(B4") 
\D(8) 
(6, £) D(B4") 
ND, E) 


X > p 


beh 
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Proof. This result is a direct consequence of Lemmas[I]and[2] but we like to give the 
concrete construction, as in the proof of Proposition [I] First we define an extension 
ô*: X — (TX) of ô like in (I by induction: 


FE =ne) d*(a)(a- 0) = uft T e)a) )(0)]. 


Then we can define the required map as: 





o* 


Te A* A* 
beh = (x ais (fe)" 


(TB)“’ —> B*"). 











(Tx) 


In the remainder of this section we shall investigate several instantiations of the 
monad T in the results above. 


3.1 The Identity Monad and Deterministic Automata 


If we take T = id, with 8 = id as identity algebra we get \ = id and B = id, so that 
A-coinduction is just the ordinary form of coinduction for deterministic automata. 


3.2 The Powerset Monad and Non-deterministic Automata 


In the above context we now consider the situation where the monad T is the powerset 
monad P and where the output set B is 2 = {0,1}. An Eilenberg-Moore algebra of 
P is a complete lattice (see e.g. Chapter VI.2, Exerice 1]), i.e. a poset with joins 
(and hence also meets) of all subsets. Since 2 = P(1), we have a free monad structure 
J: P(2) — 2 given by union. The strength map st: P(Y*) > P(Y)* is st(u)(x) = 
{ f(x) | f € u}. The resulting distributive law, say A”: PD > DP, is given by: 


AK 





P(X4 x 2) P(X)4 x2 
U +} ——> (da € A. { f(a) | 3b. (f,b) € U}, Af. (f,1) € U) 








The final coalgebra is in this case the set 24° = P(A*) = L(A) of languages over 
the “alphabet” A, see Example[I](ii). The induced algebra structure P(L(A)) — L(A) 
is simply union UJ. 

The AP -coinduction principle from Theorem[I]tells how a state x of a non-deter- 
ministic automaton is mapped to the associated language (that is accepted starting from 
x as initial state): 





P(X)4x2=DP(X)------- > DPL(A) 
{P(U) 
DL(A) = L(A)4 x 2 
x ~ C(A) 


This was first noted in [5| Corollary 4.4.6]. 
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3.3 The Multiset Monad and Weighted Automata 


It is well-known that the Kleene-star or list monad X ++ X* has monoids as Eilenberg- 
Moore algebras. The monad M for commutative monoids is given by multisets: 


M(X) = {y € N* | yhas finite support}, 


where the support of ọ is the set supp(y) = {x € X | y(x) ~ 0}. Such a ọ can thus be 
represented as finite sum nızı +--+: + NkTpk of elements x; € X with “multiplicities” 
ni = p(x;) € N. The action M(f) on such a representation is then simply nı f (21) + 
+++ + nf (ap). The unit of this monad is x > 1a and multiplication is nıpı +--+ + 
Nkpk to Ax E X.n191(x) +--+ + neve (2). 

An M-automaton (ô, €): X + M(X)4 x 2 is then a so-called weighted automaton. 
For a state x € X and letter a € A there may then be several result states x; in the 
outcome 0(x)(a) = nızı +--+ + Nnkzk, each with a particular “weight” n;. 

The set 2 forms a commutative monoid via finite disjunctions T, V—and also via 
conjunctions. The disjunctions induce a commutative monoid structure on £(A) given 
by union of languages. Since this is an idempotent monoid, the structure of multiplici- 
ties is ignored when a state is mapped to the associated language. 





3.4 The Semiring Monad 


A basic observation is that there is a distributive law of monads 7: (—)* o M > Mo 
(—)* between the list and multiset monads. It is given by multiplication in N: 


TX 





M(X)* M(X*) 
(P13: Yn) c D yalat) Pn(@n) (T1, En) | zi E SUPP(Yyi)} 


0 ifmÆn 
= Alyi, Ym) € X*. z 
pı (y1): < PnlYn) otherwise. 


With some perseverance one can prove that 7 is a natural transformation that commutes 
appropriately with the monad structures. 

It is a standard result that in presence of a distributive law like 7:(—)* o M > 
M o (—)* the composite M o (—)* is again a monad, see for instance [6[19/4]. 
Moreover, the multiset monad M can be lifted to a monad M on the category of (—)*- 
algebra (monoids), such that the algebras of the composite monad M o (—)* are the 
same as M-algebras. This functor M maps a monoid (X, -, 1) to (M(X), e, 7(1)) with 
multiplication e given by: 


pey = ditvlx)o(y)(x-y) | x € supply), y € supp(y)}. 


An Eilenberg-Moore algebra (M(X),e,7(1)) — (X,-,1) for the monad M con- 
sists of a commutative monoid m: M(X) — X whose structure map m preserves the 
monoid structure. Such an algebra of the composite monad is thus a semiring. There- 
fore we call the monad the semiring monad, and write it as S(X) = M(X*). 
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Rutten Section 9] explicitly considers deterministic automata X — X4 x B 
where the set B is a semiring, i.e. carries an Eilenberg-Moore algebra S(B) — B. 
This includes his main examples B = R and B = 2. In those cases the final coalgebra 
B® is also a semiring, via pointwise construction. Theorem[I] yields for a “semiring” 
automaton X — S(X)4 x 2 a mapping X — L(A) to languages over A. 


3.5 The Language Monad 


The language monad £(X) = P(X*) can be constructed similarly to the semiring 
monad S(X) = M(X*), namely via a distributive law. The algebras of the language 
monad are Kleene algebras with arbitrary joins, also known as unital quantales, see 
for more information. Theorem [I] then yields behaviours for states of “language au- 
tomata” X — £(X)4 x B. They resemble alternating automata [27]. 


4 Regular Expressions 


As is well-known, regular expressions are built up from constants 0,1, letters a € A 
from a given alphabet A, sum s + t, composition s - t and Kleene-star s*. These opera- 
tions form an algebra of the functor: 





R(X) =1+14+(X x X)4+(X x X)4+X 


where we ignore the alphabet for a moment—because it will turn up in the associated 
monad below. The initial algebra of this functor R is not so interesting: it consists of 
the (closed) terms that can be obtained from 0, 1 via +, -, (—)*. Notice that at this stage 
there are no equations involved. They will appear in the next section. 


Example 3. For an arbitrary set U, the set of languages £(U) = P(U*) over U carries 
an R-algebra structure R(£(U)) — L(U). It is given by the familiar definitions 


zero term case: Or 
one term case: 1— {0} 
sum case: (L1, Lz) — Li U Lo 
product case: (L1, L2) — {01 : o2 | o; € Li} 


star case: L — Unen L”. 


since a single (algebra) map R(L(U)) — L(U) jointly describes five maps of the 
form 1 > L(U), 1 > L(U), L(U) x L(V) — LU), L(V) x L(U) — L(U) and 
L(U) — L(V), giving the individual operations of regular algebra. 

For the special case where U = Ø we get an algebra structure on £(0) = P(0*) = 
P(1) = 2. This structure R(2) — 2 uses 0, V and 1, A as additive and multiplicative 
monoids, and the constant map x — 1 as star operation. 
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Usually one considers regular expressions over an alphabet A. It means that the let- 
ters a € A are used as atoms to build up regular expressions. This can be done via the 
free monad R* generated by R. It is defined on a set A as the initial algebra of the func- 
tor X +> A+ R(X). We shall sometimes write Re, for the carrier R*(A) of regular 
expressions over A, or simply Re if the alphabet A is clear from the context. This set 
Re is built up inductively from 0,1,a € A using the regular operations +, -, (—)*. 

We thus have an initiality isomorphism [74,74]: A + R(Re) — Re, where the 
map T4: R(R*(A)) > R* (A) is the free R-algebra on A. The extension map o: R > 
R* is then given by o = T o R(n). 

The next result collects the basics about this situation. 


Lemma 3. In the situation described above: 


1. The functor A > R*(A) is a monad, whose category of Eilenberg-Moore algebras 
is isomorphic to the category of R-algebras. The multiplication of this monad is 
defined by initiality in: 





id +R 
R" A+ R(R R"A) OTR) RA y RRA) 
m7|= | a7 
R*R*A PA =R*A 


2. The R-algebra on 2 from Example[B\yields a distributive law A: R*D = DR* for 
the deterministic automaton functor D = (—)^ x 2. 











Proof. The first point is standard, and the second is a special case of Proposition [B] 


With this result, an R-algebra from Example [3] say r:R(L(U)) —> L(U) corre- 
sponds to a unique Eilenberg-Moore algebra T: R*(L(U)) — L(U) with F oo =r. 
Especially for U = Ø this yields an algebra R*(2) — 2 that will be used in (4) below. 
The multiplication u maps a term s(t1,... , tn) built up from other terms t1,...,t¢,, as 
atoms, to the term s|t1, . . . , tn] obtained by substituting these t; into s. 





Example 4. The standard interpretation of the set Re4 regular expressions over an al- 
phabet A in the set L(A) of languages over A may be understood as the unique homo- 
morphism of algebras: 





Re Aa == R*(L(A)) 
na| | with [n(a)] = {(a)}. 
Rey, = R*A > L(A) 


[-] 
The Eilenberg-Moore algebra on L(A) arises from the R-algebra from Example [3] 
Freeness of p4 and the inclusion {(—)}: A — L(A) does the rest. 
Usually one does not make a clear distinction between an expression like s = 1 + 
a*ba* € Rey and its interpretation || s] = 1 U a*ba* € L(A). Here however, we like 
to keep the two apart, and use an explicit interpretation function | — ]. 
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4.1 Two Questions 
Given this basic set-up, we ask ourselves the following two questions. 


1. Is there a coalgebra/automaton structure (D, Æ} on regular expressions such that 
the above interpretation || — ]] is also a homomorphism of coalgebras, as in: 


R*(I-D 











R*R*A R*(L(A)) 
HA | 
Rey = R*A [I L(A) (3) 
(D,E) 2? afa, £) 
(R*A)A x2 L(A)? x2 


[-l4 x2 
2. Is this diagram a map between two «-bialgebras, for a suitable distributive law k. 


We address this matter in the next two subsections. The first question can be answered 
positively, and involves Brzozowski’s “derivative” and “non-empty word” operations on 
regular expressions from [8]9]. The second question will be solved by a special kind of 
distributive law, following the so-called GSOS format. It puts the concrete construction 
of Brzozowski in the general framework developed in [35]. 


4.2 Regular Expressions as Coalgebras 


From a coalgebraic perspective the most interesting part of regular expressions is that 
they form a deterministic automaton (D, E): Re > Re“ x 2 = D(R*(A)). 
The output operation Æ: Re — 2 is obtained by freeness as the unique map in 


R*(E) 
R*(Re)-——- > R*(2) 
n| | with E(n(a)) =0 (4) 
Re------->2 


where the algebra structure R* (2) — 2 is as described before Example[4] Commutation 
of the diagram (4) yields the equations F(0) = 0, E(1) = 1, E(s+t) = E(s) V E(t), 
E(s-t) = E(s) A E(t) and E(s*) = 1. This operation E describes what is sometimes 
called the empty word property. 

Since the values of E(s) € 2 are either 0 or 1, we shall often treat E(s) as a term in 
Re. 

By induction on the structure of a term s € Re one checks the first bi-implication: 


E(s)=1<> () € [s] = e[-P) =1 


Hence £ o [ —] = E, which is one part of the lower square in (8). 
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The “derivative” operation D: Re — Re^ is more complicated. It is due to Brzo- 
zowski [8], see also [9]. We shall use the common notation D,(s) for the successor 
term D(s)(a). The derivative is defined by the following clauses (or rules). 


BO) =O Dals + t) = Dals) + Da(t) 
Da(1)=0 Dals: t) = Dals) -t+ E(s) + Dalt) 6) 
OC eee eal Da(s*) = Da(s)- s*. 


0 otherwise. 


Is this a proper inductive definition? The problem is in the clause for composition, 
where the term t is used in the subterm D,(s) - t in original form. Similarly for s in 
the star case. Hence we cannot use an inductive/freeness definition like for E in (4). 
We have to use recursion to deal with the additional parameter. The remainder of this 
subsection elaborates the required formulation of recursion. 

A categorical analysis of strengthened induction principles for a functor F is given 
in in terms of distributive laws between F' and a comonad—dual to the approach 
underlying Theorem[I] We shall use this approach in the current situation where F is 
the functor A + R(—) for regular expressions described in the beginning of this section 
and the comonad is simply (—) x D for a set D, with coalgebra A = (id, id). We 
concentrate on the result, and refer to for the distributive law involved. 


Theorem 2 (Recursion following [36]). An initial algebra a: F(D) -=> D satisfies 
the following strengthened induction property: for each map f: F(X x D) —> X there 
is a unique map h: D — X making the following diagram commute. 


F(h x D) 
F(D x D) —————> F(X x D) 
F(4)Î 
F(D) f 


ole 
D ————> X 
h 
Proof. We shall give a direct proof, ignoring the distributivity properties involved. Let 
f: F(X x D) — X therefore be given. Write f’ = (f,a o F(m2)): F(X x D) — X x 
D. It gives by initiality rise to a unique map k: D > X x D with k o a = f' o F(k). 
Then 72 o k = id by uniqueness of algebra maps a — a. Hence we take h = 7 o k. 














With this theorem the derivative operation D: Re > Re^ can be obtained by re- 
cursion from a map |f1, f2]: A + R(Re^ x Re) — Re^ in: 





A+R(Re) ogo eee >A+R(Re* x Re) 
maje Jis © 
Re >Re4 
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The map fı: A — Re^ is defined as f,(a) = àb € A.if b = a then 1 else 0. And 
fa: R(Re* x Re) > Re^ is given by the following cases. 


zero term case: 0— Aa € A.0 
one term case: 1 — àa € A.Q 
sum case: ((~1, s1), (p2, $2)) — Aa E A. y1 (a) + p2(a) 
product case: (l1, 81), (ya, $2 (a) - s2 + E(s1) - p2(a) 


star case: (y,s) — Aa € A. (a) - s*. 


we a 


)— ùa € A. y1 


Commutation of the diagram (6) now yields the appropriate clauses (5) for the derivative 
function. Further, by induction on s € Re one proves: 


[ D(s)(a) ] = D([s])(@) asin Example[]] (ii) 
={o€e A*|a-c€[s]}. 


This means that || — ]] is a homomorphism of both algebras and coalgebras in (3). This 
settles our first question from Subsection In particular, the operational seman- 
tics (|| — ]] as coalgebra homomorphism) is compositional (i.e. is an algebra homomor- 
phism). 

We now turn to the second question from Subsection|4. 1] 


4.3 Regular Expressions as Bialgebras 


Since the derivative operation D: Re — Re“ is defined by recursion (instead of in- 
duction), the distributive laws and bialgebras described in SectionB]do not work in this 
situation. Interestingly, the so-called GSOS format does work. It has been developed 
in syntactic form for process calculi [714], and formulated categorically in [35]. We 
follow the latter approach—see also [5]. The main point is that these GSOS laws have 
an extra parameter—like in recursion. 


Definition 2. For a monad T and functor G, a GSOS law is a distributive law of the 
form \:T(G x id) => (G x id)T with rz o A = T (m2). 

A A-model, or GSOS model, for such a GSOS law A, consists of an Eilenberg- 
Moore algebra a:TX — X and a coalgebra b:X — GX on the same state space, 
such that the pair TX + X ve GX x X is a d-bialgebra; equivalently, such that 
the following diagram commutes. 


Tx —%& »x—~s ex 


T((b, ia))| fea 
T(GX x X) GTX 





oA 
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The formulation of GSOS law that we use is not quite the same as in [35]. The 
latter handles the special case where the monad T is free, i.e. of the form F*. The 
next result shows that this special case of our definition is equivalent to the “natural 
transformation” formulation used in [35]. 


Proposition 5. Let F be an arbitrary endofunctor with associated free monad F*. 
There is a bijective correspondence between: 


GSOS laws F*(G x id) A> (G x id)F* 


natural transformations F(G x id) = GF* 








We use an overline-notation \ > À, p > P for this correspondence, in both directions. 


Correspondingly, F* X 4, X 4, GX is a \-model (as in Definition B) if and only 
if the following diagram commutes. 


px 222, y 9 -gx 


F((b, ia))| fo 
F(GX x X) GF*X 








In view of this result, we shall often also call a natural transformation F (G x id) > 
GF* a GSOS law. 


Proof. We only describe the constructions, and leave the details to the interested read- 
ers. For the correspondence between GSOS laws and natural transformations, first as- 
sume a GSOS law A: F*(Gxid) => (Gxid)F™. It gives rise to a natural transformation: 


es A 
—— (F(GX x X)—2> F* (GX x X) GF" X x F*X > GF*X) 


Conversely, for p: F(G xid) => GF* we define a distributive law P = (p1, p2): F* (G x 
id) = (G x id)F* where pp = F*(m2) and pı is defined by recursion (following 
Theorem[2) in: 


e x X) 


+ id + F((p1,id)) ( (GX x X) + 
F(F*(GX x X)) 


o 7 F(GF*X x F*(GX x X)) 


al (Cyom Guano A x F*(r2))] 





F*(GX x X) > GF*X 
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The equivalence with respect to models amounts for F* X  X 4, GX to: 


b,id 
pox — x MG x iayx Px £22 y Gx 
| | 
ETU Ga a iff F(b id)) da 
F*(GX x X) (G x id) F*X F(GX x X) ———> GF" x 








The direction from left to right is straightforward, and the reverse direction requires the 
use of uniqueness in recursion. 














Example 5. The regular expression functor R(X) = 1+1+(X x X)+(X x X)+X 
and the deterministic automaton functor D(X) = X4 x 2 are connected via a GSOS 
law: 


PX 





R(X4 x 2x X) R*(X)4* x2 


o =c (M€ A.0,0) 


1 c 0 _ (ME A.0,1) 
plus 
((p1, b1, z1), (p2, b2, £2)) e (Aa EA. pı(a) =a ya(a), bi V b2) 


product 
m (Aa E€ A. y1(a) “a+ by . y2(a), bı TAN b2) 





((p1, b1, £1), (Ya, ba, £2) 
(y, bz) m $E (a E A. gla) - 2%, 1). 


One recognises the clauses/rules for D and FE as described in the previous subsection. 
Their format can thus be expressed via a GSOS law; see [5] for more information about 
such correspondences. We shall illustrate that this law is fundamental, in the sense that 
it induces familiar structure (and associated results) on regular expressions. 


There are a number of general results about GSOS laws that put our running exam- 
ple in perspective. We shall concentrate on these results first, and return to the example 
of regular expressions at the end of this subsection. The next two results are the ana- 
logues for GSOS laws of Lemmas[I]and[2] The proof of the second one uses a form of 
recursion for Eilenberg-Moore algebras. 


Lemma 4. If we have a GSOS law A: T(G x id) = (G x id)T, then a final coalgebra 
¢: Z = GZ induces a final \-model with algebra a: TZ — Z defined by coinduction: 


TENE i 
7110 aĵ 
T(GZ x Z) =l¢ 
T((Ç, id)) } 
TZ------- =Z 
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Proof. By uniqueness one obtains that a is an Eilenberg-Moore algebra. By construc- 
tion the pair (œ, Ċ) is a \-model. It is final because for an arbitrary A-model TX > 
X - GX the induced coalgebra map X — Z is also an algebra map—again proven 
by uniqueness. 














Lemma 5. Given a GSOS law \:T(G x id) = (G x id)T there is a bijective corre- 
spondence between GT -coalgebras and \-models with free algebra: 


“equations” X — > GTX 








A-models TTX TT TX ae GTX 


and also between corresponding solutions and bialgebra maps. 


Proof. The proof relies on the following “recursion” version of freeness for Eilenberg- 
Moore algebras: for each f: X — Y anda:T(Y x TX) — Y there is a unique map g 
in: 


T2X _ PUG, id) T(Y x TX) 
n| |e with gon=f (7) 
Peay een 4 


provided that a satisfies a o 7 = mı anda o u = a o T((a, u o T(m2))). The proof of 
this property is much like the proof of Theorem[2Jand left to the reader. 

We only describe the correspondence between equations and GSOS models, and 
leave the rest to the interested reader. Given e: X — GTX define € via (7) in: 


T((@, id)) 
T2x--—~- + >T(GTX x TX) 
"| |cnomon with €on=e 
TX = = ole l > GTX 


By construction this forms a A-model. In the reverse direction, given d: TX — GTX 
one takes d = d o n: X — GTX. Then Z = Z o ņ = e. And d = d follows by 
uniqueness, using that (1, d) is a GSOS model: Gu o mı 0 À o T((d,id)) = d o p. 














Remark 1. 1. If we apply the construction of the previous lemma starting from a law 
p: F(G x id) = GF* like in Proposition [5] then the GSOS model F*F*X => 


F*X —s GF*X associated with an equation e: X — GF*X can be described 
via recursion (like in Theorem[2) as: 
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id + F((d, id)) 








X + F(F*X) >= X + F(GF*X x F*X) 
mje ficu o p] 
F*X F > GF*X 


This will be used later. 

2. In Proposition 5.1] it is shown that a (GSOS) law p: F(G xid) = GF* induces 
a lifting of the free monad F* to the category CoAlg(G). The construction uses 
the previous point: it takes a coalgebra b: X — GX to the coalgebra-part of the 
bialgebra corresponding to the equation G(7) o b: X > GF*X. 


With all these general GSOS results in place we are finally in a position to analyse 
the situation of regular expressions and languages, using the GSOS law from Exam- 


ple[5] 
Theorem 3. 1. The “equation” A — D(R*(A)) that is given by the two maps 


A RA A——>2 
at—>b€ A. ifb = a then 1 else 0 a œ 0 





corresponds by Lemmaf[5]to the free algebra and Brzozowski automaton structure 
on the set Re = R* (A) of regular expressions: 


(D, E) 


R* (Re) Re ———— Re’ x 2 





2. The final D-coalgebra L(A) = L(A)4 x 2 of languages yields by Lemma[A|the 
final bialgebra: 


II 


R*(L(A)) L(A)4 x 2 





L(A) 





with the standard algebra of regular expressions. 

3. The interpretation | —]: Re — L(A) introduced via freeness in (3) can also be 
obtained as beh: Re — L(A) by finality using the previous two points. 

4. Bisimilarity between regular expressions is a congruence: s = s' andt = t 
implies s +t 2 9 +t, s-t 2 s' -t and 5* 58", 


Proof. 1. Let’s write e: A > Re“ x 2 for the equation. We need to check that the 
Brzozowski structure (D, Æ) from Subsection [4.2] fits in the description in Re- 
mark([I](1), i.e. that the following diagram commutes, 


A+R(Re) JES ENED) PEE K2x Re) 


[n,7] | |ie (u4 x id) o p] 


Re Re^ x 2 





(D, E) 
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where p is as described in Example[5] This diagram commutes because the Brzo- 
zowski structure (D, E) precisely follows the GSOS law p. 

2. Similarly we need to show that the standard interpretation a: R(L(A)) — L(A) 
yields a commuting diagram in Lemmal4] This means that (0, £) o a = (a4 x id) o 
po R(((6, £), id)), which can be checked easily—where (ô, £} is the final coalgebra 
structure on L(A). 

. Obvious, since || — ] is also a map of coalgebras. 

4. The bisimilarity relation = — Re x Re is the equaliser e at the bottom row below, 

because of Proposition[2]and because || — ] = beh by the previous point. 


Ww 














P R([-Dom 
R*(2) R*(Re) x R*(Re) R*(L(A)) 
! R(=] om 
HXH 
y [-Jom 
2 z Re x Re L(A) 





[l-l] omr 


The map d = (R*(mı © e),R*(m2 o e)) induces an algebra structure on the 
relation ~, as indicated. This makes a congruence. 














The map || — |: Re — L(A) defined by initiality is by construction “compositional”, 
in the sense that it preserves the operations. This map describes what may be called the 
denotational semantics of regular expressions. In contrast, the map beh: Re — L(A) 
obtained by finality describes the operational semantics, because it is induced by the 
dynamical (coalgebra) structure on regular expressions. The equality of denotational 
[| — ] and operational beh semantics in point 3 of the previous theorem says in particular 
that the operational semantics is compositional, so that for instance the behaviour of a 
sum expression is the sum of the behaviours of the two summands. Many coincidences 
of operational and denotational semantics are described in more concrete form in [B]. 


5 Regular Expressions with Equations 


An equational logic for regular expressions is formulated by Kozen in [23], for which a 
completeness theorem is proved. An alternative proof of completenes (again by Kozen) 
is given in [24]. Here we shall give a coalgebraic review of the situation, which leads to 
a third completeness proof. It is similar, but shorter, than the proof in [24]. 

Throughout this section we fix a finite alphabet A. We shall indicate where we need 
this finiteness (in Definition|4). 

The definition of Kleene algebra from involves a particular formulation of the 
rules for the star operation. It requires for an algebra [0, 1,+,-,(—)*]: R(Y) — Y that 
(Y,0,1,+,-) is an idempotent semiring in which the star axioms and rules in point 2 
below hold. 

One can also turn the set Re of regular expressions into a Kleene algebra via a 
suitable quotient. For clarity we shall use a special symbol = C Re x Re for the least 
relation satisfying the next three points. 
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1. (Re, +, 0,-, 1) is an idempotent semiring, i.e. 
- (Re, +, 0) is an idempotent commutative monoid, in which one defines a par- 
tial order by s < t <=>} s+t =t. 
— (Re, -,1) is a monoid, where - preserves the additive monoid structure +, 0 in 
both arguments: s- (t +r) = (s-t) + (s-r) and(t+r)-s=(t-s)+(r-s), 
and also s : 0 = 0 and 0 - s = 0. 


2. The star inequalities and rules: 


1l+s-s* <s“ 1+s*-s<s* ore Se cress 
{s< 2 s-t <a 


3. Axioms and rules making = a congruence, i.e. an equivalence relation preserved 
by the operations: s = s’ and t = t implies s + t = s +t, s-t = s-t and 
st = sl, 
We shall write Re/= for the set of regular expressions modulo =. By construction it 
forms a Kleene algebra. As usual, we often simply write s for the equivalence class 
[s] = {t € Re | t = s} € Re/=. 

Of the many results that can be derived in Kleene algebras we shall need the fol- 
lowing ones. 


Lemma 6. In an arbitrary Kleene algebra one has: 


1l. 1+ s- s* = sž*; 
2. s- x = zx- t implies s* -x =x t. 


And each term s € Re satisfies s > X` e4 a: Dals) + E(s). 


Proof. The inequality 1 + s -s* < s* is one of the star axioms. And s* < 1 + s» s* is 
obtained by applying a star rule to the inequality 1 + s -x < x for x = 1 + s: s*. 

For the second point it suffices to show: if s -x < x - t then s* -x < x - t*. The 
latter can be obtained via a star rule from x + s - (x - t*) < x - t*, which follows from 
the assumption s- 7 < x.t. 

The final inequality s > $` e4 4 Da(s) + E(s) is obtained by induction on the 
structure of s € Re. 














The following two standard lemmas (see e.g. [9[24[31]) must be made explicit first. 


Lemma 7. 1. The derivative operation on regular expressions preserves equality, i.e. 
satisfies s = t => Da(s) = Da(t), for each letter a. Similarly, s = t => E(s) = 
E(t). 

The Brzozowski coalgebra structure (D, E): Re > Re“ x 2 thus restricts to 
(D, E):(Re/+) — (Re/=)4 x 2, making the quotient map [—]: Re > Re/= 
a homomorphism of coalgebras. 
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2. Ifs = t then [s] = [t], ie. s,t yield the same languages. Hence the diagram 
of bialgebras can be further refined by taking images: 


R(Re) R(Re/=) R(L,(A)) R(L(A)) 
| E | 
Re Re/= —— > L(A) L(A) (8) 


(A) 
(0.8)| (D.B)| | si £) 
Re^ x 2 —— (Re/=)4 x 2 —> L,(A)^ x 2 — L(A) x 2 














where L, (A) is the subset of regular (also called rational) languages obtained as 
interpretation | s] of a regular expression s. 


The completeness result of states that the (restricted) homomorphism || — || in 
the middle of (8) is an isomorphism, see Theorem[4]below. 














Proof. By induction on the length of derivations of =. 


The derivative operation D: Re — Re^ yields a multiple derivative D*: Re — 
Re^” like in (D. Similarly we get D*: Re/=— (Re/=)/” for expressions modulo 
equations. We shall also use the subscript notation in these situations (and drop the 
star), so that D,(s) = D*(s)(o) with cases D; (s) = s and Da.o(s) = Do(Da(s)). 


Lemma 8. Expressions modulo equations have only finitely many successors: for each 
term/state s € Re the set )(s) = {D,(s) | o € A*} C Re/= of successors of s in the 
coalgebra Re/= — (Re/=)* x 2 is finite. 


Proof. The basic terms are easy, since (0) = {D,(0) | o € A*} = {0}, O(1) = 
{1,0} and (a) = {a,1,0}. For the compound terms one first proves the following 
equations. 


Dols +t) = Do(s) + Do(t) 
D,(s-t)=Do(s):t+ XO E(D-(s))- Dp(t) 
T-p=o;pA\() 
Do(s*) = De(1)+ Do(s)-s*+ XO  E(D,(s))-Dp(s*). 
Tp=0;T,p#() 
These equations are obtained by induction on the length of o € A*. 
If we now write # (s) € N for the number of elements of }(s), then: 


# (0) =1 #O(s +t) < #O(s)-#O() 
# (1) =1 # Ols- t) < #O(s)- 2790 
# O(a) =3 #O(s*) < #O(s) 2# 906), 











Hence we can conclude that each subset &(s) C Re/= is finite. 
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Next we shall define a category in which the Brzozowski automaton on Re/= lives. 


Definition 3. We write DetAuty, for the category of deterministic automata with finite 
behaviour. Objects are coalgebras (6, €): X —> X^ x 2 such that for each state x € X 
the set of successors Q(x) = {d*(x)(o) | o € A*} C X is finite. Maps in DetAuty, 
are the usual homomorphisms of coalgebras. 

(Notice that we leave the set A of inputs implicit in the notation.) 


It is not hard to see that if (X — X^ x 2) 2 (Y — Y^ x 2) is a surjective 
coalgebra homomorphism where X — X4 x 2 is in DetAutyg,, then so is Y > Y4 x2. 
The reason is that f(6*(x)(o)) = 6*(f(x))(c), and so &(f(x)) C f[(x)]. Hence the 
automaton structure £,(A) — L£,(A)4 x 2 from @) is also in the category DetAuty,, 
via the surjection | — ]: Re/=— £,(A). 

A basic property of Kleene algebras is that an inequality x > s- x + t has a least 
solution s*t, via the star rule and via s* -t > s- (s* . t) + t. Even stronger, the latter is 
actually an equality, since s - (s* i t) +t= (s se 1) ‘t= s*-t. 

This can be generalised to equations in multiple variables, using the standard fact 
that square matrices in Kleene algebras form again Kleene algebras, and can be used to 
solve equations, see Section 3]. A system of n equations: 


£i = Sil T1 Speeds Sinn + t; 
has a least solution that can be described as vector S* - T where 


S11 ©? Sin ty 


S= : and T= 
Sn1 `` Snn tn 


describe the equation as ? = S- X + T and the star operation S* is in the Kleene 
algebra of n x n matrices. 


Definition 4. Let (6,£): X — X^ x 2 be an arbitrary coalgebra with finite behaviour 
(i.e. an object of DetAut,,). With each state x € X we associate a term' x’ € Re/= 
in the following way. 

By assumption <(a) is finite, say Q(x) = {£1, £2, ..., En} where xı = x. An 
n x n transition matrix Sx = (sij) and an output vector Ty = (ti) over Re/= are 
constructed with elements 


sij = J {a € A | ô(z;)(a) = xj} and ti = e(xi). 


We then take" x” € Re/= to be the first element of the least solution S* - T,, of the 
associated equations. More formally, as vector product, “x7 = (10... 0) - S% - Tz. 


The sum J- in this definition exists because we have assumed that the alphabet A is 
finite. The sum over an empty set is 0, as usual. Notice that the ordering of the elements 
in Q(x) is not relevant. 

One can understand S as a big square matrix X x X — Re/= defined by (x, x’) — 
X` {a | 6(a)(x) = x'} like in [24]. The matrix Sz in the definition is then the restriction 
of S to {£1,..., 2n} CX. 
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Lemma 9. The mapping x œ "x ' is a homomorphism of coalgebras. 
Proof. Consider x = xı € X as in Definition[4] We need to show: 
E('a')=e(4) and D(Tx)(a)="8(x)(a)'. 


We notice that the vector of solutions in Re /= can be described as " x; '. Hence 


rri? = Si1° rri? Pesin Cn. + e(zxı), 
where each s;; is a sum of atoms/letters from A. Thus: 


E(x) = E(sıı Txil +: Sin’ En + elxı)) 
= (E(s11) TAN E("217)) VV (E(sin) A Er oe )) V E(e(21)) 
= (0A E(Tx17)) V- V (OA E("an’)) V e(z1) 


= e(xı) 

D(T£7)(a) = D(s11 "£17 +-+- + Sin "En + e(z1))(a) 
= D(sı1 "x1 ')(a) +--+ D(Sin "£n (a) 
=D( 


s11)(a) "z1? + E(s11)- D(x") (a) +--+ 
D(s1n)(a) "8n? + E(sin)  D(Ttn™)(a) 

= D(s11)(a) "£17 +-+: + D(sin)(@) -" an | 

= "a if 4(¢)(a) = z; 


="0d(x)(a)". 

















By finality this homomorphism" — ' yields a commuting diagram: 


Re/= tl. L(A) — L(A) 


Fasi 
beh 
X 


In particular, when X = £,(A), we see that "—7is a section of | — ]. 
Corollary 1. The coalgebra L,(A) > L, (A)4 x 2 is final in the category DetAut py. 


Proof. Given a coalgebra X — X^ x 2 in DetAuty, there is a composition of ho- 
momorphisms | — ] o "—':X — Re/=— £,(A). If we have two homomorphisms 
f,g:X — L,(A), then by postcomposition with the inclusion £,(A) <— L(A) we 
get two homomorphisms to the final (—)4 x 2 coalgebra—which must thus be equal. 
Hence also f = g. 














At this stage we can obtain Kleene’s theorem [21], as point 2 below. Point 1 is 
Theorem 10.1]. 
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Corollary 2. 1. A language L € L(A) is regular—i.e. belongs to L,(A) > L(A)—if 
and only if the set of derivatives }(L) is finite. 
2. A language L € L(A) is regular if and only if it is accepted by a finite automaton 
(i.e. an automaton with a finite state space). 


Proof. 1. If L € £,(A), then >(L) is finite because £,(A) is in DetAuty,. Con- 
versely, if (LZ) is finite, then (L) can be considered as a subcoalgebra (L) — 
L(A) that belongs to DetAuty,. Hence it factors as ¢(L) > £,(A). 

2. If L is regular, then (L) is itself a finite automaton (by 1) with initial state L € 
(ZL) whose behaviour beh(L) € £,(A) is L itself. Conversely, if L € L(A) is 
beh(«) for an initial state x € X of a finite automaton, then (x) is finite, so 
L = beh(x) € £,(A) because £,(A) is final in DetAuty,. 


The next two lemmas and their proofs are reformulations of results in [24]. 
Lemma 10. If f: X — Y is a homomorphism in DetAutyy, then Efe) t= es 
Proof. If (a) = {#1,..., 8n} where xı = a, then >(f(x)) = {f(x1),.-.,f(@n)}.- 


The latter set may be smaller than the former. We shall consider the following three 
square matrices S, f, S/:{1,...,n}? — Re/=. 


Sig = Tifa | z: => aj} (A), a if feJ = f) 
(St jy =D {a | f(z:) > F(z;)} ia 0 otherwise. 


Then there is an equality of matrix products: 


(S : he z La Sik ` (Prj 
= Did {a | r: => z} | z € O(@) A f(z) = F(z) 
=} Ha | dz € Q(x). xi = zA f(z) = f(z3)} 
=J {a | f(z) > f(z) 
= P {a | 3z € Q(z). f(z) = f(x) A F) = F(2;)} 
=P la] fle) = flea} | 2 € Ole) A F) = F(a} 
= Delfin (ey 
m (F SP) 
Lemma|[6] (2) now yields S* - f = f. (SF)*. If we write T for the vector of elements 
e(a;) = e(f(ax;)), then f. T = T, since 


(f-T), = Eplir Te = L {elar) | f£) = f(2:)} 
= Viele.) | f(e) = fea} = e(2:) = Th. 

















Hence: 
T¢'=(10...0)-S*-T=(10...0)- 
= (10...0): 
E Pees sfin) (87) 
i) 


FT 
(SP) T 
T 
=P S) | F) = fa) = Fe)". 


s* 
f- 
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The last equation holds even though Sf may be “too big” a matrix, describing too 
many equations. These additional equations however are repeated equations, which do 
not influence the least solution. 














Lemma 11. The homomorphism" —": Re/= — Re/= is the identity. 


Proof. We first establish the following points. 

1.°s'< s, for s € Re/=; 

2.17 = 1and'"0' = 0; 

3. s < t implies "s! < "t?; 

4.s<'s!. 

The first and fourth point then yield the required result. 

As to the first point, for s € Re/= we obtain "s7 via the recipe in Definition [4] 
namely by considering the successor states/derivatives ®(s) = {s1,..-, Sn} and the 
associated transition matrix. By Lemma [6] these terms s1,..., Sn satisfy the defining 
inequality for" s; ', so that" s; ' < s;, since "s; ' is the least solution. 

The term 1 has one successor, namely 0. The associated single equation, following 
Definition|4] is £ = 1, which has as (least) solution "17 = 1. Similarly "07 = 0. 

For the third point we consider the product X = Re/= xRe/= as state space with 
two coalgebra structures (D, E1), (D, E2): X — X4 x 2, where 


D(s,t)(a) = (Da(s), Da(t)) E,(s,t) = E(s) Eo(s,t) = E(t). 


The projections m;: X — Re/= are then homomorphisms from (D, FE) to (D, E). 
Hence Lemma|[10] applies. Given elements s,t € Re/=, let S = S(s,t) be the tran- 
sition matrix associated with (s,t) € X, and T1, Tz be the associated output vec- 
tors determined by the output functions F1, Eo respectively. Thus, if s < t, then 
E,(s,t) < E2(s,t) and similarly for all successors of (s,t)—because D and E are 
order preserving. Hence Tı < Tb, and thus: 


Tsl="qy(s,t)'="(s,t)' wrt. (D, E1) 
=(10...0)-S*-T, 
(10... 0)-S*- Tp 
="(s,t)’ wrt. (D, E2) 
="7o(s,t)! 


=e", 


IA 


For the fourth point we proceed like in and prove the stronger statement Vt € 
Re. s-t! < "s-t? by induction on s. We are then done by taking t = 1, using point 2. 


- 0-H = 0+0 r0. t, 

m e E EE g 

-b t'> Meca a-D,("b-t) + E("b- t”) by Lemmal6] 
= Paca a Dalb- t)? + E(b-t) by Lemma[] 
=b lt) by point 2. 
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— (sı + 59) H = s1 -TET4 gy TET 


<"s,-t'+"sq-t" by induction hypothesis 





<"s,-t+s2-t' by point 3. 
= F(s; + s2) t”. 
— (s1: s2): tE? = s1- (s2 -"t') 
<s,-"sq:t' by induction hypothesis 
< Tsı - (s2 - t)? by induction hypothesis 
=F (s1 s2) t7. 


— Finally, s* -"¢t' <" s* - tis obtained by applying the star rule to: 


Ttlt6-Us*-t'<'t1+4%s- (s* - t)? by induction hypothesis 
<"t+s-(s*-t)" by point 3. 
~"(l4+s-s*)- t 
=ls*.t7 by Lemmal6] 

















Theorem 4 (Completeness ). The Brzozowski coalgebra Re/= — Re/=* x2 
is final in DetAuty. Hence the (bialgebra) homomorphism || — |}: Re/=— £r (A) is an 
isomorphism. 


Proof. Each object X => X4 x 2 in DetAuty, yields a homomorphism "—1: X — 
Re/= by Lemmal9] Suppose we have two homomorphisms f, g: X — Re/=, then by 
Lemmas[1O]and[] I] we have: 


Final object are unique up-to-isomorphism, so the coalgebra homomorphism [| — ] = 
beh: Re/=— £,(A) is an isomorphism by Corollary [] 














Another way to formulate this result is: Kozen’s axioms and rules give a complete 
axiomatisation of bisimilarity for regular expressions. Indeed, for s,t € Re, 


s = t <> beh(s) = beh (t) by Proposition[2] 


<> |s] =[t] by TheoremB](3) 
<> js] = [t] by Theorem[4] where [—]: Re > Re/= 
s> s=t. 


This gives a perfect bialgebraic match, where the equational logic on the algebra-side 
completely captures the observational equivalence on the coalgebra-side. Similar such 
results occur for instance within a line of work in process algebra. 
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6 Conclusions 


We have illustrated the effectiveness of the bialgebraic approach introduced by Turi and 
Plotkin by showing how it neatly connects the elementary and classic structures of 
computer science, namely regular expressions, automata and languages. It thus forms a 
framework for what we consider to be the essence of computing: generated behaviour 
via matching algebra-coalgebra pairs. This framework may even guide developments 
in settings which are more complicated and possibly less well-developed, like extended 
regular expressions [22], or timed and probabilistic automata and their languages. 
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Abstract. We present a way of viewing labelled transition systems as 
sheaves: these can be thought of as systems of observations over a topol- 
ogy, with the property that consistent local observations can be pasted 
together into global observations. We show how this approach extends to 
hierarchical structures of labelled transition systems, where behaviour is 
taken as a limit construction. Our examples show that this is particularly 
effective when transition systems have structured states. 


1 Introduction 


Despite many advances in developing calculi and formal models for concurrent 
processes, it is still difficult to reason effectively about large systems, which may 
comprise many subcomponents related in intricate ways, and have a correspond- 
ingly large state space. It is therefore important to find compositional methods 
of specifying, analysing, and reasoning about hierarchical, distributed and con- 
current processes based on coherent notions of observation and behaviour. 

Goguen has proposed sheaf theory as a semantic foundation for the 
study of concurrent and distributed systems. Sheaf theory is concerned with the 
transition from local to global properties, and a sheaf can be thought of as a 
system of observations made at various locations in a topology, with the key 
property that consistent local observations can be uniquely pasted together to 
provide a global observation. Thus, the semantics of a distributed system could 
be couched in terms of the topology of the system and the local observations 
that could be made of its various parts, and the overall, global behaviour of 
the system then emerges from the behaviour of its parts. Goguen’s paper builds 
upon earlier work on Categorical General Systems Theory [78], and together 
these papers provide a rich variety of different kinds of systems, including musical 
pieces [9[12], that do indeed give rise very naturally to sheaves. The approach has 
been used to give semantics to Petri nets by Lilius [I6], and to object-oriented 
languages, originally by Wolfram and Goguen [25], and also by Ehrich, Goguen 
and Sernadas [6], and by Cirstea [4]. 

Many of the examples of Goguen’s sheaf-theoretic approach use discrete time 
as a topology: here, behaviour is observed locally at particular intervals of time, 
and the global behaviour is the behaviour over the union of these intervals. 
Cirstea’s work also provides a relationship between sheaves on discrete time and 
transition systems, and strengthens the arguments for a sheaf-theoretic approach 
by showing that transition systems give rise to sheaves: a transition system has 
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© Springer-Verlag Berlin Heidelberg 2006 


406 Grant Malcolm 


an ‘underlying’ sheaf. In this paper we further explore the possibility of using 
sheaf theory to provide a semantic foundation for distributed concurrent sys- 
tems, by exploring their relationship to labelled transition systems. We present 
an adjunction that provides translations between labelled transition systems and 
sheaves on a topology of traces, i.e., prefix-closed sets of words over the alphabet 
of labels. However, our main interest is in systems built from subcomponents. 
We show that the adjunction extends to hierarchically structured transition sys- 
tems by using a principle from Goguen’s Categorical Systems Theory: behaviour 
is limit. Although colimits are also used to model ways of combining concurrent 
processes (see, e.g., [22]), we would suggest that for processes with a structured 
notion of state, limits provide the most useful ways of combining systems. In- 
deed, the limit constructions we consider provide ways of structuring states. The 
following section contains some examples; see also [I7]. As a consequence of the 
emphasis that we place on states, we are less interested in notions such as bisim- 
ulation. We show that the adjunction between transition systems and sheaves 
extends to hierarchical systems, and that the translation from transition systems 
to sheaves preserves limits and, hence, behaviour. 

We assume familiarity with basic notions from category theory: functor, nat- 
ural transformation, limit and adjunction (see [[0[1] for introductions). 


This paper is dedicated with great affection to Joseph Goguen on his sixty 
fifth birthday. I had the privelege and pleasure of working as a research assistant 
with Joseph for several years; I can think of no better apprenticeship for a 
computer scientist. His wealth of ideas and breadth of vision were stimulating 
and inspirational, and I am delighted to dedicate this to him as inspiration, 
teacher, and friend. 


2 Transition Systems and Sheaves 


The following subsections review labelled transition systems and sheaves, and 
presents an adjunction between them. We begin by recalling some basic defini- 
tions concerning labelled transition systems, which we generalise to allow tran- 
sitions to take labels in an arbitrary monoid. 


2.1 Transition Systems 


Definition 1. A labelled transition system over L is a pair (T,+—>), where 
T is a set of states, and —> C TxLxT is the transition relation. We write 
test! for (t,1,t’) E€ —, and we will usually refer to a transition system (T,+—) 
simply as T. A pointed transition system is a transition system T with a 
distinguished initial state to € T. 

A morphism of transition systems over L (Tı, —1) > (Th,+2) is a func- 
tion f : Tı > T> such that ifti t then f(t) iy f(t); a morphism of pointed 
transition systems in addition maps the initial state to the initial state. 
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When there is no risk of confusion, we will drop subscripts and decorations on 
the arrow ‘-—>’, and simply write, for example, ‘if t-t then f(t) ae ft)? 

Note that there is another common definition of morphism in the literature 
(see, e.g., [23[22]), whereby morphisms ‘lift’ transitions, in the sense that for a 
morphism f : Tı > To, if f(t) t in Tp, then there is some tı € T, such that 
tot in Tı and f(t:) = t’. This definition is particularly useful in studying 
bisimulation, but since that is beyond the scope of the present paper, we use the 
simpler definition above. 


Example 1. Any subset S C L* of lists over L gives rise to a transition system 
(S,-—), where wesw’ iff w = wl. For example, take S = {e,a,aa,ab} C 
{a, b}*, where € is the empty list. Then we have e5 a, aœ aa, and ares ab, 


describing a simple transition system with a ‘fork’ at state a. Any morphism 
f : (S, —) > (T,—) describes a similarly forking path (or run) in T: both 


f(e) = f(a) > f(aa) and 
b 


fle) fla) — F(ab) 


The following is an example that we will refer to later on in Section [8] where 
we consider hierarchical structures of transitions systems. 


Example 2. Consider a coffee dispenser that dispenses coffee only after payment 
has been received. Later on, we will see an example concerning a coin slot that 
accepts payment; for the moment we simply assume a boolean value that says 
whether or not payment has been received. The states of the coffee dispenser are 
pairs consisting of a boolean value and a number between 0 and 20, indicating 
the level of coffee available; that is, the state set is Bool x {0..20}. 

There are three labels for transitions: d for dispensing coffee, n for notification 
that payment has been received, and r for refilling. Transitions are described 
exhaustively as follows: 


— (true, N) a (false, N — 1) for all 0 < N < 20, 
— (false, N)“> (true, N) for all 0 < N < 20, and 
— (B,N)> (B,20) for all B € Bool and all 0 < N < 20. 


Any transition system over L extends to a transition system over L* with 


t-t ifft =t' 
tt ifft t and tt for some t” . 
That is, transitions can be freely extended to paths of transitions. We can use 


this to provide a slightly more general notion of transition system that takes 
labels in a monoid. 
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Definition 2. Let M = (M,-,¢) be a monoid (we generally write mm’ in place 
of m-m'); we say m is a prefix of m’, and write m < m’, iff m! = mn for 
some n E M, and we say that a subset X C M is prefix-closed iff x < y and 
y E€ X implies x € X. We write 2Q(M) for the set of all prefiz-closed subsets of 
M (including M itself), and form € M, we write m| for the set of all prefixes 
of m (including m itself) 

A labelled transition system over M is a pair (T,+—), with —> C 
TxMxT such that 


trot ifft=t 
tee ift t and t >t" for somet ET. 
A morphism f : (T,—) > (T',—>) of transition systems over M is a function 
f:T >T such that tt implies f(t) f(t'). This gives a category LTS m 
of transition systems over M. 
We usually refer to a labelled transition system (T, —) simply as T. 


In the sequel, we will be interested in transition systems that are built from 
other transition systems by taking limits. For the present, we note 


Proposition 1. The category LTSm is complete. 


Limits are constructed from limits of the underlying state sets; we will see 
some examples in Section [B] 


2.2 Sheaves 


Sheaf theory is used in many branches of mathematics, the underlying theme in 
its various applications being the passage from local to global properties [I3]. 
It provides a formal notion of coherent systems of observations: a number of 
consistent observations of various aspects of an object can be uniquely pasted 
together to give an observation over all of those aspects. The passage from local 
to global properties, and the pasting together of local observations of behaviour 
allow sheaf theory to be usefully applied in computer science, to give models 
for concurrent processes [I9J5[16] and objects [IIJ6]25/4[17]. We give a basic 
definition of ‘sheaf’ below; fuller accounts can be found in [2TJ15). 

We may consider a sheaf as giving a set of observations of an object’s be- 
haviour from a variety of ‘locations’. The notion of location is formalised by the 
following 


Definition 3. A complete Heyting algebra is a partially ordered set (C, <) 
such that: 


e for allc,d E€ C, there is a greatest lower bound c ^d 
e for all subsets {ci |i E€ I} of C, there is a least upper bound V 
e greatest lower bounds distribute through least upper bounds: 


(Vead = Vand). 


tet wel 


ier Ci 
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For example, any topological space with the inclusion ordering between open sets 
is a complete Heyting algebra; also, any complete lattice is a complete Heyting 
algebra. In particular, the set of prefix-closed subsets of a monoid, N(M), is a 
complete Heyting algebra. 

Like any preorder, a complete Heyting algebra C can be seen as a category; 
in particular, the opposite category C°P has the elements of the set as objects, 
and a unique arrow from c’ to c precisely when c < œ. 

Definition 4. Let C be a complete Heyting algebra; a presheaf F on C is a 
functor from C°? to Set. That is, for each c € C there is a set F(c), and for 
c,d E€ C such that c < d, there is a restriction function F.<q : F(d) > F(o), 
subject to the following conditions: 

e Fe<c = idpye), the identity on the set F(c); and 

e if c & d < e, then Faze; Fesa = Tege 


Notation 1 For a presheaf F on C, ifc < d in C and x E€ F(d), we often write 
zle for Fe<a(a). 


A sheaf is a presheaf which allows families of consistent local observations to 
be pasted together to give a global observation. 


Definition 5. A presheaf F is a sheaf iff it satisfies the following pasting 
condition: 

o if c = Vicrci and z; € F(c;) is a family of elements for i € I such that 
Tiliac; = Ljlciac; for all i,j € I, then there is a unique x € F(c) such that 
the, = xi for alli € I. 

A morphism of sheaves 0 : F — G is just a natural transformation from F to G 
viewed as presheaves. 
We write Shy for the category of sheaves over 2Q(M), where M is a monoid. 
Given 0 : F — G, naturality of 0 says that 6 respects restrictions: given 
Y < X, and e € F(X), 
Oy (ely) = Ox(e)ly - 


Example 3. For S C M, if we write 2(S) for the prefix-closed subsets of S, then 
Q is a sheaf over N(M); given an inclusion X C Y of prefix closed sets, then 
Qxcy takes V CY (ie, V € Q(Y)) to VOX CX (ie, VOX € 2(X)). 


Example 4 (Eventually ¢). Let T be a pointed transition system on L* with a 
distinguished initial state tọ € T, and a subset of states 6 C T. The functor 
ob: Q(L*) — Set defined by 


06(X) = |] {p:wl >T] ple) = to^ 
wEX (dw € L*, p : w] —>T)w <w Ap tu, =p 
^ p(w) Ep} 





is a sheaf. 
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We will look in more detail at limits in Section B} again we note 
Proposition 2. The category Shy, is complete. 


Limits are constructed pointwise from limits in Set. For example, given 
sheaves F and G, their product F xG is defined by FxG(X) = F(X)xG(X). 


2.3 An Adjunction Between Transition Systems and Sheaves 


We present a translation from transition systems to sheaves, and a translation 
from sheaves to transition systems. The main result of this section is that these 
translations form an adjunction. 

Given a transition system T, we construct a sheaf from T by considering 
sets of paths in T, as in Example [| Recall that a (forking) path in T was just 
a transition system morphism f : X — T for some X € N(M). If we have 
fı: Xı >T and fo: X2 > T such that filx,ax, = filxinx,, then clearly these 
functions can be uniquely pasted to give a morphism, or path, X,;UX2 > T. 


Definition 6. The functor Shy, : LTS — Shm is defined by, for a prefix- 
closed subset X € 2(M) 


Shy (T)(X) = LTSm(X,T) . 


Note that, because every X € N(M) is a transition system, LTSm(_,T) is a 

functor N(M)? — Set, and so this definition also applies for morphisms (i.e., 

inclusions) in N(M). That is, restriction in Shm(T) is restriction of paths. 
For f:T — U in LTS, the natural transformation 


is defined by saying that for each X € N(M), the component Shm(f)x takes a 
T-path h: X — T to the U- path h; f : X — U. This is in fact the action of 
the functor LTS,4(X,_) on f, and naturality of Shyy(f) is a consequence of this 
fact. 


Going the other way, we represent a sheaf by its set of ‘elements’ (m, e), where 
m € M and e € F(m\). transitions on these states are given by the restriction 
actions of F. 


Definition 7. The functor Tr : Shy, —> LTSm is defined by 


Tru(F)= >> F(ml) . 


meM 


Transitions in this system are defined by (m,e)>(m',e’) iff m = mn and 
e'm, =e. For natural transformations 0: E — F in Shu, 


Tru (9) : Tru (£) > Tru (F) 


takes (m,e) € Trm(E) to (m, 0m (e)) € Trm(F). 
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Our main result of this section is that Sh m gives the ‘underlying’ sheaf of a 
transition system. 


Theorem 1. Trm is left adjoint to Shy,. 


Proof. The unit of the adjunction is given by nr : F > Shu (Trai(F)), which, 
for X € 2(M), takes e € F(X) to the path mapping x € X to (x, els). For 
any transition system T, a morphism h : F — Shm(T) uniquely extends to 
h? : Term (F) > T, which takes (m,e) € Trm(F) to hm (e)(m). 


This generalises a result of Winskel that gives an adjunction between standard 
transition systems and presheaves [3]. The adjunction applies more to presheaves 
than to sheaves. A further twist can be given by considering pointed transition 
systems. Let F be a sheaf, then Tr m(F) can be made a pointed transition system 
by designating (¢,*) as initial state, where * is the unique element of F({e}). 
Since morphisms of pointed transition systems preserve initial states, and if we 
take Shm(Trm(F))(X) to be the set of pointed transition system morphisms 
from X (with initial state £) to Trm(F), then any such morphism corresponds 
uniquely to a consistent family of elements em; E€ F(m|) for m € X, which, 
since F is a sheaf, corresponds uniquely to an element e € F(X). Thus, if we 
specialise the above adjunction to pointed transition systems, the unit of the 
adjunction is an isomorphism. 

In the next section, we look at how behaviour of composite systems arises 
through limit constructions. Since Shy, is a right adjoint, our translation from 
transition systems to underlying sheaves preserves limits, and therefore, be- 
haviour: 


Corollary 1. Shy, preserves limits. 


3 Hierarchical Systems 


In this section we explore the notion of behaviour as limit for transition systems 
built from component parts. We start by allowing transition systems to vary over 
the monoids of their labels, and extend the completeness results of the previous 
section to this setting. Correspondingly, we also introduce morphisms between 
the ‘trace’ topologies of sheaves, and extend the adjunction of Theorem [I] to 
hierarchically structured transition systems. 

We give an example based on the coffee dispenser of Example[2] which shows 
that the appropriateness of limits as giving behaviour of composite systems 
depends, to some extent, on our ‘state-based’ approach to transition systems. 

Finally, we use a generalisation of the notion of sheaf to show that the notion 
of behaviour as limit is, in itself, sheaf-theoretical. 


We begin by noting that the category Mon of monoids and monoid homo- 
morphisms is complete. Limits of monoid homomorphisms capture the notion of 
synchronisation on actions. 
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Example 5. The pullback of f; : Mi —> M (i = 1,2) is the monoid with under- 
lying set {(x1, £2) E€ Myx Mə | fı(x1) = f2(x2)}, unit € = (£,£) and composition 
defined by (x1, %2)(y1, Y2) = (41y1, Y2y2), together with first and second projec- 
tions to Mı and Mə respectively. 

As a particular example, let M = {c}*, Mı = {a,b,c}* and Mo = {c,d}*, 
with fi(a) = fi(b) = € = fo(d), and fi(c) = c = fo(c). Then the pullback 
contains all pairs (x,y) in {a,b,c}*x{c,d}* where x and y contain the same 
number of c’s. 

We can think of the monoid homomorphisms as taking a sequence of actions 
in some system M; and ‘restricting’ them to a sequence of actions in a subsystem 
M. In the particular pullback described above, the common subsystem M has 
only one action, c. We can think of these ‘words’ as expressing sequences of 
actions where the action of c is synchronised in Mı and Mo. 

Also note that 


(abc, dcd) = (c,d) (ab, £) (c,c) (e,d) = (a b, £) (£, d) (c, c) (e, d) 


so unsynchronised actions from different components can occur in any order. 


3.1 Behaviour as Limit 


We begin by considering morphisms between transition systems over different 
label monoids. 


Definition 8. The category LTS has objects (M,T), where M is a monoid, and 
T is a labelled transition over M. A morphism ¢ : (M, T) — (M', T") is a pair 
ọ = (f,g), with f: M —> M' a monoid homomorphism, and g : T — T’ such 


that ift >t, then g(t) E a(t"). 


Again, these morphisms can be thought of as expressing a restriction to a 
subsystem. 


Example 6. Recall the coffee dispenser of Example B]as a transition system over 
M = {d,n,e}*. We give an example of a morphism from the coffee dispenser to 
a simple coin slot that can accept coins. The state set of the coin-slot transition 
system is Bool, indicating whether a coin has been inserted. There are two labels 
for transitions: c for a coin being inserted, and e for ending a transaction (the 
coffee is dispensed and the coin chinks into the money box). The transitions are 
defined exhaustively by: 


— false true 
— truerH> false 


We describe a morphism from the coffee dispenser to the coin slot. The monoid 
homomorphism on labels is defined by 

dre 

nec 


FERE 
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and the mapping from the state of the coffee dispenser (Bool x {0..20}) to that 
of the coin slot (Bool) is just the first projection. It is straightforward to check 
that these maps preserve transitions; for example, in the coffee dispenser, 


(true, N) pee (false, N — 1) 
for all 0 < N < 20, which translates to true-> false in the coin slot. 


In this example, we think of the coffee dispenser as actually comprising a 
coin slot as a subsystem. This is perhaps somewhat unnatural; a more realistic 
description might have both the coffee dispenser and the coin slot sharing a 
common subcomponent (essentially just the boolean value of the example above). 
We hope that the familiarity of this example to readers will make them more 
disposed to indulge such simplifications. 

The case of two systems sharing a common subcomponent is treated in the 
next example, which illustrates 


Proposition 3. The category LTS is complete. 


Limits are taken componentwise, consisting of a limit of monoids, together 
with a limit of the associated transition systems. 


We saw in Example[6] how a morphism from a coffee dispenser to a coin slot 
expressed the idea that the coin slot was a subcomponent of the coffee dispenser, 
so that coffee was only dispensed after a coin had been put in the slot. We first 
present another morphism from a money box to the coin dispenser (so that coins 
put in the slot eventually end up in the money box), and then show how the 
limit of these two morphisms behaves. 


Example 7. A money box with a coin slot as a subcomponent can be specified as 
having states that are pairs whose first component is a boolean value, specifying 
whether there is a coin in the slot, and whose second component is a natural 
number specifying how many coins are in the money box. 

Transitions are labelled by c for a coin entering the slot, t for the coin being 
taken from the slot to the money box, and m for all the money being taken out 
of the box. Transitions are defined exhaustively by 


— (false, N)> (true, N) for all N > 0, 
— (true, N) +> (false, N + 1) for all N > 0, and 
— (B, N) => (B,0) for all N > 0 and B € Bool. 


The monoid homomorphism to the coin slot is given by 


cec 
tre 


Mt-re 


and the first projection (B, N) +> B gives the mapping on states. 
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We now have two mappings to the coin slot, indicating that this is a subobject 
of both the coffee dispenser and the money box: 


CD MB 
TANN P 2,92) 
CS 
The limit of the monoid morphisms gives a label monoid of 


{(u, v) € {d,n,r}*x{c,t, m}* | f(u) = f2(v)} (1) 


Since the coffee dispenser and money box synchronise on coins entering and 
leaving the coin slot, this requires every d (dispense coffee) to be paired with a t 
(take the coin), and every n (notify there’s a coin in the slot) to be paired with 
a c (coin in the slot). Thus, the pullback monoid of labels is effectively the same 
as lists over {dt, nc, r, m}, but where occurrences of r and m (the unsynchronised 
actions) commute (dt represents the synchronized event of coffee being dispensed 
and the coin being taken from the slot, i.e., dt = (d, t), while nc represents the 
synchronised events of a coin being put in the slot and the coffee dispenser 
notified of this) i.e., nc = (n, c); for example: 


ncormdtm=ncmrdtm 


are equal because it does not matter what order unsynchronised events occur 
in — if you like, they occur in separate frames of reference with no notion of 
simultaneity applying. Both sequences represent a coin being put into the slot, 
then the machine being refilled and its money box emptied (in either order, or 
even ‘at the same time’), then coffee being dispensed, and then the money box 
being emptied once again. 

The state set of the limiting transition system is 


{(b, x, b', y) E€ Bool x {0..20} x Bool x Nat | gi(b, £) = go(b’, y)} . 


Since in this case both gı and g2 are the first projection, the requirement is simply 
that b = b’. In other words, the state of the common coin-slot subcomponent 
is shared by the coffee dispenser and the money box. Any changes in the coin 
slot’s state must occur in both the coffee dispenser and the money box. 

This synchronisation on a shared subcomponent is again reflected in the 
transitions of the limiting transition system: essentially, a label (u,v) as in (J) 
represents a u-transition on the coffee-dispenser part together with a v-transition 
on the money box part. Formally, 


(b x, by) ER (b',2',b',y') iff (bz) (b,x) and (b, y) = (b, y) 


For example, 


(true, x, true, y) E (false,x — 1, false, y +1) 
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for 0 < x < 20 and y > 0, represents coffee being dispensed and the coin being 
taken from the coin slot to the money box. 


3.2 Sheaves of Hierarchical Systems 


We extend the relationship described in Section [2] between transition systems 
and sheaves to the hierarchical systems described in the previous subsection. 
This gives a much more interesting notion of topology that arises from the way 
a hierarchical system is composed from its subcomponents. By ‘more interesting’, 
we mean that such a topology gives a more realistic notion of location at which to 
observe a system. We conclude by observing that the behaviour-as-limit approach 
to hierarchical systems naturally gives rise to sheaves on such a topology. 

We begin by showing that monoid homomorphisms allow a translation be- 
tween topologies 2(M). 


Definition 9. A monoid homomorphism f : M — M’ extends to a mapping 
Q(f) : Q(M) —> Q(M’') defined by 


Q(X) ={ye M |y < f(x) for some xe X}. 


That is, Q(f)(X) is the prefiz-closure of the image f(X). 
The morphism also extends, contravariantly, to a functor Shw — Shy, 
which we also denote Q(f), defined by 


for a sheaf G on 2(M’'). 


Now we can define morphisms between sheaves on different trace topologies. 
Corresponding to the category LTS, we have a category Sh, where, intuitively, 
morphisms correlate to restrictions to subsystems. 


Definition 10. The category Sh has objects (M, F), where M is a monoid, and 
F is a sheaf on 2(M). A morphism (M, F) — (M’,F") is a pair (f,0), where 
f:M—=M' is a monoid homomorphism, and 0: F > Q(f)(F’). 

Given (f,0) : (Mi, Fi) > (Ma, Fo) and (g, K) : (Ma, F2) —> (M3, F3), the 
composite (f,0);(9,«) is (f39, 9; roig) : (Mi, Fi) > (Ms, F3). To make sense 
of the second component, note that for X € R(M1ı), 


Ox : F(X) > Q(f)(Fe)(X) = Fo(2(f)(X)) 
and 
kA) © PX) > 2(g)(F3)(2(F)(X)) = F3(2(F; 9) (X)) 


This is an example of a Grothendieck category, with the following consequence 


(see, e.g., BOJ): 
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Theorem 2. The functor Shy, extends to a functor Sh : LTS — Sh, taking 
(M,T) to (M,Shm(T)); also, Try, extends to Tr : Sh > LTS taking (M, F) 
to (M, Trm(F)). Moreover, Tr is left adjoint to Sh. 


As a corollary, since Sh is a right adjoint, it preserves limits, and therefore 
preserves the behaviour of a composite system constructed by taking limits, as 
in Example[/] We might use this to verify a property such as the ‘eventually-¢’ 
property of Example [4] by constructing an element of o¢(X) by showing that 
every word w € X can be extended to a path ending in a state where ¢ holds. 
However, this 2(M) topology really only says that the branching behaviour of 
a transition system arises by pasting together linear paths w| — T. A more 
powerful approach to capturing global properties through local properties would 
be to construct sheaves on a topology representing the hierarchical structure of 
a composite system. Such topologies can arise through downwards-closure. 


Let X be a preorder category, with a unique morphism x — y whenever 
x > y. For example, X might have objects 0, 1 and 2, with 0 < 1 and 0 < 2, and 
a functor 6 : X — LTS, for example, would then represent two transition systems 
with a shared subcomponent. The completion 2(X) of downward-closed subsets 
of X, in this example, looks like 


{0,1,2 


Ww 


{0,1} 


‘ 


, 2} 


< 
~ 
(a) 


0 


which represents all of the parts of the system: downwards-closure means that 
subcomponents are always included in a ‘part’. Note that the example shows that 
moving from X to §2(X) is very like moving from the basis of a pullback diagram 
to a pullback diagram. The top element that is added, {0,1,2}, corresponds to 
the limit, i.e., to the behaviour of the entire system, while {0,1} corresponds 
to the system on the left, together with its ‘component’, 0. In Example [7] this 
latter would be the coffee dispenser with its component coin slot. 


Rather than extend the machinery of the previous sections to sheaves on 
topologies 2(X), we conclude by showing that the notion of behaviour as limit 
is itself sheaf-like. For this, we need a generalisation of the notion of sheaf that 
seems to be due to Gray (cf. [I4], Chapter 18), and allows for sheaves that take 
values in categories other than Set: 
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Definition 11. A sheaf with values in a category L is a functor F from a com- 

plete Heyting algebra to L such that if X = Vier Xi, then 

F(X) — [F x)= Il F(X; ^ X;) 
iC ijel 


is an equaliser diagram (where all the arrows arise from the obvious restrictions 
by the universal property of the target product). 


The universal property of the equaliser diagram expresses that consistent families 
of ‘elements’ can be uniquely pasted together: a unique arrow to F(X) arises from 
an arrow to Į] [ier F(X;) that equalises the parallel arrows, that is, ‘elements’ of 
each F(X;) that agree on overlaps X; A X}. 

The following follows directly from general properties of limits: 


Proposition 4. Let X be a preorder category, and let 6 : X — LTS. Define 
ô* : Q(X) — LTS by 6*(X) = lim(6[x); then 6* is a sheaf of transition systems. 


This states that the behaviour of a composite system arises by pasting to- 
gether the behaviours (i.e., limits) of its components. We could combine this 
with Theorem [2] to obtain that 6*;Sh is a sheaf of sheaves: the behaviour of a 
composite system arises by pasting together consistent paths through its compo- 
nent parts. We might thus, for example, verify that particular paths are possible 
globally by verifying that their restrictions to subcomponents are possible locally. 

We can also show that the approach goes beyond an interleaving model 
of concurrency. One attempt to capture asynchronous or ‘true concurrency’ is 
Winskel and Nielsen’s notion [24] of transition systems with independence; these 
are transition systems with a relation, |, on transitions, which specifies inde- 
pendence between transitions (e.g., they can occur truly concurrently). Such an 
independence relation is required to satisfy: 


t t vt et >t =t 


tt | tS te > (Gu) t t |ti u AAt t |tu 





tty |t Gu > (tz) tS t |s t At tə | tp >u 





mMm m n m n 
t— ty ~tau | w= w > tt | w= w’ 





where ~ is the equivalence relation freely generated by <, which is defined by 
t-t < tau iff there is an n with t> ti | tte, t> t | tu 
and t> te | t2 Hu. We can give a simpler characterisation of independence 
for sheaves of transition systems. Suppose F is a sheaf of transition systems on 
Q(X), as in the example above, then transitions tı et and to ws th in the 
limit are independent iff m},o,1} = € and nlxo,2} = € (or vice-versa w.r.t. 1 and 
2). That is, transitions are independent iff they are local to separate parts of the 
system. More generally, given C = C1 UC, transitions at F(C) are independent 
iff one restricts to £ at Cı and the other restricts to £ at Co. 





Proposition 5. Independence of transitions for sheaves gives transition systems 
with independence in the sense of Winskel and Nielsen EJ. 


418 Grant Malcolm 
4 Conclusion 


We have presented an adjunction between transition systems and sheaves on 
a topology of traces. The functor from transition systems to sheaves is right 
adjoint, and therefore preserves limits, which we consider to be the behaviour of 
hierarchical systems of transition systems. 

The eventual aim of this work (from which we are still a long way off) is to 
provide semantic foundations for reasoning about hierarchical, distributed con- 
current systems. Of paramount importance in such an endeavour are coherent 
notions of behaviour and observation. In this paper we have adopted the prin- 
ciple that the behaviour of composite, hierarchical, systems is given by a limit, 
and our results add to the argument that viewing sheaves as systems of 
observations is coherent with the notion of behaviour as limit. As the examples 
in this paper illustrate, the notion of behaviour as limit leads to a more ‘state- 
based’ view of transition systems. Often, states in transition systems are viewed 
as little more than ‘place holders’ between transitions; there are advantages to 
such an approach, indeed it is almost necessary for process calculi and the study 
of bisimulation, and one area for future work is to relate the state-based approach 
to these established and successful fields. 

Labelled transition systems are one of the fundamental structures in concur- 
rency, and this paper establishes some relationships between transition systems 
and sheaves. It seems quite possible to further develop this, and relate sheaves 
usefully to other fundamental structures. To some extent, this has already been 
done, for example Monteiro and Pereira [19] consider event systems, Ehrich et 
al [6] and Goguen [Ii] apply sheaf-theoretic machinery to concurrent object sys- 
tems, while Monteiro applies related concepts to coalgebra. These all seem 
to be quite separate threads of development, and it would be instructive to find 
some means of drawing them together. One possibility lies in Lawvere’s notion 
of ‘control category’, which determines a structure on observations, and is used 
by Bunge and Fiore [2] to give a general framework for considering concurrent 
processes. 
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Dedicated to Joseph Goguen on his 65th birthday 


Abstract. This paper studies uniformity conditions for endofunctors 
on sets following Aczel [I], Turi [27], and others. The “usual” functors 
on sets are uniform in our sense, and assuming the Anti-Foundation 
Axiom AFA, a uniform functor H has the property that its greatest 
fixed point H* is a final coalgebra whose structure is the identity map. 
We propose a notion of uniformity whose definition involves notions from 
recent work in coalgebraic recursion theory: completely iterative monads 
and completely iterative algebras (cias). Among our new results is one 
which states that for a uniform H, the entire set-theoretic universe V is 
a cia: the structure is the inclusion of HV into the universe V itself. 


1 Introduction 


I have considered Joseph Goguen to be one of my main teachers for many years. 
My first encounter with him was in an undergraduate course in the theory of 
computation given at UCLA around 1979. What I remember most is that se- 
rious students had to both write research papers and take an oral final exam, 
and looking back I see it as both a didactic move and a way to take seriously 
the thoughts of students. After hearing of my interests in mathematics and lin- 
guistics, he suggested that I write on representing inexact concepts in Montague 
grammar, thereby mixing topics that he considered interesting: formal semantics 
and fuzzy logic. Later, I took a graduate seminar that he and Charlotte Linde 
taught on natural language processing. I remember their strong opposition to 
generative grammar and advocacy of views that there was no “real world.” Both 
of these were a real surprise. I also remember Joseph’s sense of humor as well as 
his more serious side. 

A few years later, I was a post-doc at Stanford’s Center for the Study of 
Language and Information. Joseph had moved to SRI a few years earlier and 
was also at CSLI. I don’t know how we started, but he and José Meseguer 
started meeting to decide on something to work on together. They pointed me 
to a conjecture of theirs on abstract data type computability which I settled and 
wrote up in a paper with the two of them. One of them told me I was getting 
“on-the-job training” in category theory, and this very paper is also on-the-job 
training. I also remember Joseph’s delightful influence all over CSLI during that 
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time. I mainly lost touch with him after that, though at some point after he 
moved to UCSD we met again through my oldest friend, Martin Schapiro. 

For me, one of the most long-lasting influences were various pointers to cur- 
rents in the social sciences, along with indications that these deserved to be 
taken seriously in computer science and artificial intelligence. Another was the 
mingling and mixing of ideas from Western science and Eastern religions; de- 
spite being raised in Los Angeles and with Alan Watts’ lectures regularly on 
my radio, I scarcely had met anyone who lived the ideas. I am still inspired by 
his wide-ranging concerns and penetrating insights into many subjects. In all of 
these, I am reminded of a character in the story of his namesake the Joseph of 
Genesis, the “man” in 37:15-17, someone who points people to important places 
and ideas. It is a pleasure to thank Joseph for his many years of direct inspiration 
to me and wish him many more years. 


1.1 Whatever Happened to the Study of Recursive Program 
Schemes? 


The title of our volume is “Algebra, Meaning, and Computation,” and so I want 
to make the case that my contribution is related to all three points. As in the 
distantly-related areas of the semantics of natural language, the semantic project 
in computer science is to give some sort of mathematical model of meaning. 
Today the semantic project attracts less attention and prestige than the study 
of algorithmic complexity. (This is especially true in the USA.) Still, the area is 
important because if one wants to be sure that computer programs ‘do what they 
are supposed to’, then one quickly needs formally specified and tractable notions 
of meaning. I think it is fair to say that the centerpiece of the semantic project 
concerning computation is the treatment of recursion, the main mechanism of 
‘looping’ for computer programs and algorithms. The work reported here is an 
offshoot of coalgebraic recursion theory, an application of ideas from coalgebra 
and closely related fields to circular phenomena and more recently to recursive 
program schemes. Many of the mathematical tools that are now common in 
semantics were first introduced for the study of recursive program schemes. I 
would like to think that some of the notions in coalgebraic recursion theory also 
will enter the mainstream of semantic research. I also think that some of this 
work allows one to approach the semantics of computation from an even more 
algebraic perspective than previous studies. For example, one of Joseph Goguen’s 
early papers mentions the use of initial continuous algebras in connection with 
recursion. As a result of very recent work, it turns out that one can dispense with 
the domain-theoretic underpinnings of continuous algebras (or more precisely, 
one has a clearer understanding of the principles that make continuous algebras 
work in the first place). But none of this is a main point in this paper, however, 
and so I will only touch on these matters in passing. For a longer discussion, one 
could see [15] or [6]. 

Given the importance of recursion and recursive program schemes, one has 
to ask why the subject is not pursued so intensively these days. Here are two 
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possible answers: (a) the work that had been done was mathematically challeng- 
ing, requiring expertise in both algebra and domain theory; and at the same 
time it was not clear that the results coming out of it justified one’s mastery of 
the field. And (b), there were many easier things to do, and many of them had 
a closer connection to computer science practice. 


This paper is actually about a different subject, but with a connection: our 
concern here is the notion of uniformity for functors on sets that goes back 
to Peter Aczel’s book [I] on non-wellfounded sets. The book contains a short 
discussion of what it called the Special Final Coalgebra Theorem, a result that 
gives a sufficient condition for a functor on the category of sets (actually the 
category of classes) to have a final coalgebra whose carrier is the greatest fixed 
point of the functor and whose structure map is the identity. This is a natural 
matter to investigate from the point of view of the book. However, the particular 
condition given was difficult to understand and work with, and so it fell to other 
researchers to clarify the matter. This has been the subject of a number of other 
papers such as [19]18[22/21]. This paper is another in the same line. It revisits 
the discussion in the light of concepts introduced by Adámek and his coworkers: 
mainly completely iterative monads and algebras. This paper formulates a new 
notion of uniformity for functors and studies it under AFA, and it also obtains 
some new results. 

To read this paper, one should be conversant with the basics of category 
theory. At the same time, readers with this background only (that is, readers 
who have not worked with set theory or non-wellfounded sets) are likely to 
find the whole issue in this paper uninteresting. The reason for this is that we 
are interested in properties of functors which are not preserved under natural 
isomorphism. These properties are defined in terms of inclusion morphisms and 
greatest fixed points, and neither of these are preserved in this way. (However, 
the referee of this paper points out to me that the topic of the paper could be 
more interesting to those who have seen inclusion systems). Returning to the 
intended audience, it would also help to have seen the background notions that 
we expound in Section 2] but we have attempted to be concise and as (only as) 
complete as necessary. 


2 Background 


In this section, we present the background that we need in two parts: background 
from coalgebra, and background from set theory. In both cases, the background 
will be unusual. From coalgebra, we need a set of definitions from a handful of 
recent papers. I doubt that most people who glance at this paper will have heard 
of any of these notions, and I also know that the spare presentation here will 
not really help one to get a feeling for the substantial work in the area. On the 
set theoretic side, most of the background concerns non-standard subjects such 
as non-wellfounded sets and functors on the category of classes. 
The next two sections may be read in either order. 
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2.1 Background from Coalgebra 


Let A be category with a fixed finite coproduct operation of An endofunctor 
H : A — A is iteratable [sic]: for each object A, the functor H(-) @ A has a 
final coalgebra. This condition of iteratabiltiy is satisfied by many functors of 
interest; it is perhaps most pertinent to this paper that results of Aczel and 
Mendler [4] later strengthened by Adámek et al [5] show that every endofunctor 
on the category of classes is iteratable. In the setting of this paper, it will be 
important to remember that the power set functor is iteratable on the category 
of classes but not on the category of sets. On the other hand the subfunctors 
P, are iteratable on the category of sets; here P(X) is the set of subsets of X 
whose cardinality is at most «. 

If H is iteratable, then for each object A we have a final H(_) @ A-coalgebra 
(TA,a,4). As the notation indicates, T extends to a functor and a to a natural 
transformation. T has many properties, but only a few are explicitly needed in 
this paper. For example, we need at one point that in the endofunctor category 
[A, A], the functor Gt (H-G) @Id has (T, a) as a final coalgebra. Moreover, H 
brings not only T but also a free completely iterative monad. For our purposes, a 
completely iterative monad based on H is a monad T = (T, p,n) together with 
natural transformations a : T — HT @Id and 7: HT — T such that 


1. For all objects A of A, (TA, aa) is a final coalgebra of H(_) $ A. 
2. [7,9]: HT ® Id — T is the pointwise inverse of a. 
3. Every suitably guarded equation morphism has a unique solution. 


Actually, the first point is the key; the other are consequences and/or strength- 
enings of it. We shall not need the precise formulation of the last point, so we 
omit it. 

We shall always write « for T: Hn. In general, an ideal natural transformation 
into T is one that factors through 7; so «, for example, is ideal. 


Proposition 1. The diagrams below commute: 


HTT —+TT HT —>T 
TOP A7 

H 
HT —+T TT 


Completely iterative algebras The following notion is studied in Milius [14] and 
other papers. Let H : A — A be an endofunctor. By a flat equation morphism 
in an object A (of parameters) we mean a morphism of the form 


e:X ~HX OA. 


1 We are using the symbol © in this section rather than the more usual symbol + 
for coproducts. In this paper, + will denote the specific coproduct on sets or classes 
given by the Kuratowski pairing operation (see Section 2.2). We use the different 
notations to help the reader with this distinction. 
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Let (A,a : HA — A) be an H-algebra. We say that s : X — A is a solution of 
e in (A,a) if the square 


¥—S FAX GA 


| [isoa 


A TAn HAŞA 
commutes. (Note that we often use the name of an object (such as A) as a name 
of the identity morphism on it.) And we call (A, a) a completely iterative algebra 
(or cia) for H if every flat equation morphism in A has a unique solution in it. 


Example 1 We present a suggestive example, not so much for this paper but 
for other discussions. It is based on ideas of Peter Freyd [12] which figure in his 
presentation of the unit interval as the carrier of a final coalgebra structure on 
a certain category. Let A be the category of sets, let HX = X + X in the usual 
way, and and let J be the unit interval [0,1]. Consider the following algebra 
a:I+I—T: a(inl(x)) = x/2, and a(inr(x)) = (x + 1)/2. It turns out that (J, a) 
is a cia for H. In elementary terms, this means that every system of equations 
such as the one below has a unique solution in T: 


= i 1 = ï 
vy = 302 t3 ty = gus F 
T3 = ane T5 = x6 + 7 
3 = 54 tq = 247 


On the right we can have one of three things: a variable multiplied by 1/2, the 
sum of 1/2 and a variable multiplied by 1/2, or a constant from [0,1]. The cia 
property says that every such system, even one with an infinite or uncountable 
set of variables, has a unique solution in [0,1]. Incidentally, the easiest way to 
establish the cia property is to argue via more general results on complete metric 
spaces, eventually using the Banach Fixed Point Theorem. 


In the statement below and wherever we refer to inverses of various maps, 
recall that final coalgebra morphisms are always categorical isomorphisms. 


Proposition 2 (AMV [6], Milius [14]). Concerning cias for H and com- 
pletely iterative monads: 

1. If (A,a~+) is a final coalgebra for H, then (A,a) is a cia for H. 

2. For every object A, (TA,Ta) is a cia for H. 

3. For every cia (A,a) for H, the solution to the flat equation morphism a, is 
an Eilenberg-Moore algebra of the monad T. We write this solution morphism 
asa:TA— A. 

4. Moreover, for every cia (A,a) for H, the triangle 


HA—=A 
“LA 
TA 


commutes. 
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A solution principle Given a final coalgebra (A,a~+) for an endofunctor H, we 
can map any H-coalgebra uniquely into it. In this paper, we often will want to 
map other kinds of morphisms into it. This matter is related to the “flattening” 
constructions that one finds in the theory of non-wellfounded sets. For example 
if we have f : X — TX we shall want to define something like a coalgebra 
morphism ft : X — A and say what its properties should be. We shall use the 
notions mentioned in this section. Recall that (A,a) gives a cia for H, and we 
then have an Eilenberg—Moore algebra structure a: TA — A. Now using a and 
our f : X — TX, we shall define the map ft so that the triangle on the right 
commutes 


x—TX (1) 


4 
n| ri [rv 
A <—— TA 


a 


So there are two tasks. First, we must introduce the | ] operation on various 
morphisms and then spell out its relevant properties. Then after this we need to 
use this notation in principles of definition. 


Definition 1. Let H be iteratable, let (A,a7!) be a final H-coalgebra, and let 
T be the associated monad. Recall that a : TA — A is the Eilenberg-Moore 
algebra structure associated to A; it is the solution to the flat equation morphism 
aa : TA — HTA A with parameters in A. For any morphism of the form 
f : B — A, we let [f] : TB - A be given by 


Vl = ay 


Lemma 1. Once again, let (A,a~') be a final H-coalgebra. Here are some prop- 
erties of the morphisms [|f], where f : B > A. 

1. [f] ns = f. 

2. [f] -Te =a- Hf]. 

3. [ida] TIF] = LL] = [J] uB, where u is the multiplication of the monad T. 
4. [F] = lida]: TF. 


The proofs are routine calculations using naturality and the definition of an 
Eilenberg-Moore algebra of a monad. 
We also need what would be considered a folkloric result. 


Lemma 2. Let (A,a~') be a final H-coalgebra, so that (A,a) is a cia for H. 
Let f : X — TX 9 A factor as on the left below. 


KPA x— -TXA 
pS [eroa n| [eroa 


Then there is a unique ft : X — A such that ft = [a, A] - (TF! A)- f. 
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Proof We have already mentioned the result in Milius [I4] to the effect that 
(A, a) is a cia for H. Thus there is a unique morphism fo! making the square in 
the upper left below commute: 


K A 
Freon! ATXA 
fot Hfo'@A 


Ax—a HAGA Thot@A 





KAA 


TARA 


The triangle commutes using Proposition [I] and the square on the right by 
naturality of x. So the outside of the figure commutes, showing that fol isa 
morphism with the properties requested in our result. And if g : X — A is any 
morphism making the outside of the figure commute, then the square at the 
upper left commutes. Thus we have g = fot. This establishes the uniqueness of 
solutions. 4 


As this section comes to a close, we look back at the diagram in (I). We now 
have the promised result that this diagram defines f’ uniquely from f, assuming 
the relevant guardedness condition. 


Lemma 3. Let (A,a~') be a final H-coalgebra, and let f : X — TX factor 
through kx : HX — TX. Then there is a unique f? : X — A such that 


flo = is 


Proof Apply Lemmaf]to inl- f : X — TX @A. There is a unique g : X — A 
such that 


g = [a,A]-(Tg@A)-inl-f = @-Tg-f = I] f 


We take g for the needed morphism ft. For the uniqueness, if g = [g] - f, then 
the same calculations as above show that g = [a, A] - (Tg ® A) - inl - f; hence we 
are done by Lemma P] 4 


We emphasize that the background in this section only contains a hint of a 
more extensive subject that is currently an active area. Not only have I omitted 
many motivational points connected to recursive program schemes, first- and 
second-order substitution, the very interesting notion of an Elgot algebra, and 
the like. I also have not even mentioned all of the results that this paper will call 
upon. the results that we are going to use directly. Two places to read about all 
of this and more is Stefan Milius’ dissertation [15] and the paper on recursive 
program schemes and coalgebra [I6]. 
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2.2 Background from Set Theory 


We remind the reader of the basic facts of set theory which will be relevant in 
this paper. 

The Kuratowski ordered pair (a,b) of two sets a and b is {{a}, {a,b}}. In 
terms of this one defines and studies relations, functions, and the like. One also 
defines versions of the natural numbers by: 0 = 0, 1 = {0}, etc. Finally, we shall 
fix a coproduct operation + on sets by 


({0} x a) U ({1} x b) 
{(0,x):x €a} U{(1,y): y E€ b} 


For sets a and b, the coproduct injections inl : a — a + b and inr : b — a + b are 
then given by 


a+b 


inl(x) = (0,2) 
inr(y) = (1,y) 


Henceforth in this paper, the symbol + is used for this operation on sets (ex- 
tended in the natural way to classes). 

For any set a, Ua is the set of elements of elements of a. A set a is transitive 
if Ua C a. The transitive closure of a is 


tela) = aU|JaulJUau---. 


This is a set, and it is the smallest transitive set (under the inclusion ordering) 
which includes a. 

If a C b, we write ia p for the inclusion map of a into b. If b = V, then we 
generally drop it from the notation. So if a C b, we have ia = tp + ta,p- 

Note also that if a is transitive, then a C Pa. Further ia, Pa : a — Pa is a P- 
coalgebra, and ia : a — V is a P-coalgebra morphism from it to (V, lag = idy). 

The axioms of set theory are not about sets as much as they are about 
the universe of sets. One of the intuitive principles of the theory is that arbi- 
trary collections of mathematical objects “should be” sets. Due to paradoxes, 
this intuitive principle is not directly formalized in standard set theories. In a 
sense, the axioms one does have are intended to give enough sets to constitute 
a mathematical universe while not having so many as to risk inconsistency. But 
it is natural in this connection to consider some collections of objects which are 
demonstrably not sets. These are called proper classes. The term class informally 
refers to a collection of mathematical objects. Classes are usually not first-class 
objects in set theory (certainly they are not in the most standard set theory, 
ZFC). Instead, a statement about classes is regarded as a paraphrase for some 
other (more complicated and usually less intuitive) statement about sets. This 
is probably not a good place to discuss the details of the formalization; one clear 
source is Chapter 1 of Azriel Levy’s book [I3] on set theory. For our purposes, 
classes may be taken as definable subcollections of sets. For example, if a is any 
set, then the class of all sets which do not contain a as an element is {x : a ¢ x}. 
The class V of all sets is {x : x = x}. The definability here is in the first-order 
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logic with just a symbol € for membership, and the quantifiers range over sets 
(not classes). If C is a class, the power class of C, 


P(C) = {x:axisaset, and (Vy)(y € x > vo(y))}, 


where yc is the formula that defines the class C. 

We are interested in functors H on sets and classes which are monotone in 
the sense of preserving inclusions among objects: if a C b, then Ha C Hb. 

Each set-based| monotone operation H on classes has a least fixed point H,. 
and a greatest fixed point H*. For the least fixed point, we first define classes Ha 
by transfinite recursion: Hp = 0, Ha41 = H (Ha) and for limit A, Hy = User He. 
Then the class H, is defined by x € H, iff (da)x € Ha. The assumption that H 
be set-based, together with the Replacement Axiom, implies that H, is a fixed 
point of H, and it is easy to see by induction that each Ha is a subset of any 
fixed point of H. So H. is the least fixed point. In categorical terms, (Hx, id) 
is an initial H-algebra on the category of classes. We are especially concerned 
with the dual concept, greatest fixed points. As shown in Aczel [I], 





H* = J{b: bis a set and b C Hb}. 


H* might well be a proper class. 

For example, by Cantor’s Theorem there are no sets which are fixed points 
of the power set functor, but on classes, the least fixed point exists and indeed is 
the class WF of wellfounded sets. Another fixed point is the class V of all sets. 
Saying that PV = V just means that every set of sets is a set, and that every set 
is a set of sets. (So this would contradict any axiom of urelements, and indeed 
usually set theories implicitly do not allow for urelements.) Note that iy, ipv, 
Piy, and Pipy all denote the same operation, the identity on the universe. 

Here are some further examples to orient the reader. The identity functor 
has the universe V as its greatest fixed point on the category of classes. The 
identity has not greatest fixed point on sets. But even on classes, the greatest 
fixed point is not the carrier of a final coalgebra structure, since that would 
be a mere singleton set. But consider the variant functor H(a) = 1 x a. Here 
there are some differences, even though H is naturally isomorphic to the identity. 
Whether H has any fixed points besides is a question that is sensitive to the 
underlying set theory. Under the Foundation Axiom, the empty set Ø is the only 
fixed point of H. Under the Anti-Foundation Axiom (formulated shortly), H has 
one additional fixed point (which therefore is the greatest fixed point): there is 
a unique set a such that a = {(0,a)} (this uses AFA). And so b = {a} satisfies 
b = {0} x {a} = 1 xb. Moreover, b is the only set with this property except for 0. 

In any case, the overall point is that properties of the greatest fixed points of 
various operations are sensitive to the underlying set theory. The topics of this 
paper are certain classes which form either final coalgebras or cias for various 
functors. Again, such classes do not exist in the usual set theory ZFC, due mainly 


2? The condition of set-based-ness introduced in Aczel [I] turned out to be unnecessary 
for functors on classes: see [5/9/10]. As a result, we suppress mention of this condition. 
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to the Foundation Axiom. In this connection, and in connection with other 
coalgebraic notions, it is more natural to work in the set theory ZFA obtained 
from ZFC by replacing the Foundation Axiom with a ‘dual’ statement, the Anti- 
Foundation Axiom first formulated by Forti and Honsell and then popularized 
in Peter Aczel’s book [I]. 


The Anti-Foundation Axiom The Anti-Foundation Axiom (AFA) is the asser- 
tion that for every set b and every e : b — Pb, there exists a unique s : b —> V 
such that s = Ps-e: 


b — Pb (2) 
r 


The map s is called the solution to the system e. 
To see how this is used, we mentioned above that under AFA, there is a unique 
set a = {(0,a)}. To see this, we let b = {v, w, x,y,z} and consider e : b > Pb 


given by 
ev) = {w} ey) = {u,2} 
e(w) = {x,y} ez) = 9 
efx) = {2} 


Then if s is as in the statement of AFA, we have s(v) = {s(w)}, s(w) = 


{s(x), s(y)}, ..., s(z) =@. So s(x) = {0}, s(y) = {s(v), 0}, and 
s(w) = {{O},{s(v),O}} = (0,s(v)). 


Finally, s(v) = {(0, s(v)}. Thus s(v) is a set which solves a = {(0,a)}. It is not 
hard to check that it is the only solution, because any solution to this equation 
gives a solution to the “flat system” e by unraveling a bit. 


Lemma 4 (Turi [21], see also [18]). AFA is equivalent to the assertion that 
(V, iv) = (V,idy) is a final P-coalgebra. 


Our overall setting in this paper is ZFA. (Actually, many of the results do 
not actually use AFA, especially those before Section [B.I] But the main results 
of the paper do use it.) 

(By the way, the formulation of AFA in (2) above does not include any 
specific morphism between V and PV. This is basically the way AFA is presented 
in Aczel’s book [I], for example, and also my book with Jon Barwise [8]. The 
disadvantage of this kind of formalization is that it hides the fact that there are 
two different possible assertions: 


b —— Ph b —— Pb 
hoo {h 
V —, PV V <—— PV 


(ipv) ipv 
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When one reworks our statement of AFA using the first formulation, one can 
sense the connection to final coalgebras and Lemma A| The second formulation 
would be closer to what we find in Lemma [6]) 

The main problem for this papers and all previous ones on “uniformity” for 
functors is to propose a condition guaranteeing that the greatest fixed point of a 
monotone H be a final coalgebra together with the identity. This paper proposes 
and studies one such condition. 


3 Standard Functors and Monads 


At this point, we have all of the background we need to begin our study The first 
concept we need is that of a standard functor on sets or classes. An endofunctor 
H is standard if H preserves inclusion maps in the sense that Hia b = iHa, Hb- 
This notion was introduced in a slightly stronger form in Adámek and Trnkova’s 
book [7]; Theorem 3.4.5 of that book shows that every functor on sets is naturally 
isomorphic to a standard functor in their sense. 


Proposition 3. The coproduct + derived from the Kuratowski pair has the prop- 
erty that for all classes c, the endofunctor -+ c is standard. 


The proof is an easy calculation. Of course, the functors c+- are also standard. 
Here is a consequence of these: Let x C x’ and y C y’. Then the diagram 


inl 


xz ——> Tt +y (3) 
| |i 
/ / / 
gv’ ——> x“ +y 





inl 
commutes. 
Definition 2. Let T be the free completely iterative monad on H. T is standard 
if for each a, Ta = HTa + a, and moreover aq = idta. 
Lemma 5. Let T be the free completely iterative monad on a standard functor 


H. If T is a standard monad, then T is a standard functor. 


Proof Let aC b, and write i for ia». We know that T'i is the unique map such 
that Ti- Ta = ™- HT%. (This follows from the Substitution Theorem of 
applied to m - i.) But if we take Ti to be ira,rp, then the equation is satisfied: 


iTa,Tb'Ta = To'tHTa,HTb = To: Hirao. 


In this we are using the fact that 7 is inl for a standard functor, and also equation 
(8). So by uniqueness, Ti = ira,ro- 4 
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3.1 P Generates a Standard Iterative Monad 


We check here that under AFA, P generates a standard iterative monad. The 
general idea of our work is to use this fact to show that many other functors also 
generate standard iterative monads. In fact, our definition of uniformity effects 
such a reduction. 


Lemma 6 (See [18]). Let H be standard. The following are equivalent: 


1. (H*, id) is a final H-coalgebra. 

2. (V,inv) is a coalgebra-final H-algebra: for every class b and every e : b —> 
Hb, there exists a unique solution s : b — V, a morphism such that s = 
HV -Hs-e: 

b—> Hb 


V <— AV 
tHV 


Proof We show first that (2) implies (1). Consider e : b — Hb and its solution 
s. Let c = s[b] be the image of b under s. Then Hs[Hb] C H(s[b]) = Hc (see, e.g., 
Proposition 5.1.2 of [I8]). Condition (2) in our lemma implies that c C Hs[H], 
and so we see that c C Hc. Let t : b — c be such that ie- t = s. Then all parts 
of the diagram on the left below commute, save for the top square. 





b ` Hb b —=— Hb 
Ss eA | | 
t Ht f 
s c ——— Hce Hs e — > He 
ige, HV =Hi- tc, H* tHc,HH* 
ZN 
V HV H* = H H* 








iHv 


Thus that part also commutes. This is the top square on the right, and so it 
commutes. By the monotonicity of H, we have c C H*. Thus the bottom square 
on the right commutes, and we see that ic g» -t is a coalgebra morphism from 
(b,e) to (H*, id). 

Next, we argue the uniqueness of this morphism ie, g» - t. Suppose that f : 
b — H* is any coalgebra morphism. Let c’ = f[b], let t : b > c’, and write 
f as iv g» +t’. So we have a diagram similar to the one on the right above, 
but with t replaced by t’, and c by c’. The overall outside commutes. And since 
iHe, HH» is an inclusion and hence monic, we see that the top square commutes: 
ie He: t = Ht e. This means that the top square on the left commutes, 
mutatis mutandis. We then take s’ to be i/,- t’ so that the two triangles on the 
left commute. By our statement (1), we have uniqueness of solutions; thus s’ = s. 
It follows that t = t and œ = c. We conclude that f = ie g» +t! = icH» t, as 
desired. 
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Now we prove that (1) implies (2). Let (H*,id) be final, We check that 
(2) indeed holds. Let e : b — Hb. We have a final H-coalgebra morphism 
e* :b — H*, and we consider igp» - e*. We see that 


inv - H(in~-e*)-e = inv - Hips - (He* -e) 
— THV * T HH*,HV ` '%H*,HH* * © 
y * 
tH* ʻe 


This shows that ip» - e* is a solution to e in the sense of point (2) above. For 
the uniqueness, if s is a solution to e, then write s = ie- t as in the work we did 
above in showing that (2)=(1). By the finality of H*, we have e* = ie g» -t. But 
now s = ie: t = ip» -ie g» t = ipg e*. 4 


In the next proposition, and in the rest of this paper, we let Gwu be the 
constant functor with value w. 


Proposition 4. For every set w, ((P+G.)*, id) is a final coalgebra for P+ Gy. 


Proof We apply Lemma [6] Let e : b — Pb + w. Consider the diagram below: 








e Pbt+in 
b Pb + w — Pb + V 
ore 
f PV +w Pf+V 
B i 
iPV +w 
V iV] PV +V 


The map f comes from the fact that (V,(ipy)~+) is a cia for P. (So note that 
AFA is used here.) Thus the overall outside commutes. The right square easily 
commutes. For the triangle, we use the general fact that ia+b = [ta, ib]. (In fact, 
for classes a, b, and c such that a C c, and b C c, and a +b C c, we have 
tatb,c = [ta,c;%b,c].) We conclude that the left square above commutes. This is 
the existence of the needed f in Lemma [6] and the uniqueness comes from the 
cia structure. 4 


It follows from Proposition[4]that P generates a standard iterative monad on 
the category of classes. 


4 The Class TV and the Map x 


As we now know, the power set functor determines a free completely iterative 
monad 


T = (T? p,n’). 
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This monad is indeed standard. It also comes with additional natural transfor- 
mations a? and Tr”. Because this is the most common monad in the rest of the 
paper, we drop the superscripts on all of this data related to it. 

By AFA the inverse of inclusion gives a final coalgebra (ipy)~! : V — PV. 
Because so much of the rest of this paper uses the map [py], we shorten the 
notation to write 

x = fiev] :TV >V. 
For a mnemonic on this, think of x for yrunch. As we shall see, it takes elements 
of TV and collapses them back to sets. Those familiar with the Mostowski col- 
lapse in set theory might think of x as a kind of non-wellfounded version of that 
map. 

It is worthwhile to get a feeling for the class TV. To understand it better, 
we use Proposition [4] taking V for w. So TV is the greatest fixed point of the 
functor which takes a class X to 


P(X)+V = ({0} x PLX))U({1} x V). 
Hence TV is the largest collection C of sets with the property that each member 
of C is of one of the following forms: 
1. (0,2) for some subset x C C. 
2. (1,2) for some set z. 


Note as well that 7 : Id — T is defined by 7x(a) = (1,a) for all classes X 
and all sets a € X. As for 7, standardness implies that its components are all 
inclusions. 

We now turn to x. The elements of TV code sets as follows: 


1. (0,2) codes the set of sets coded by the elements of x. 
2. (1,2) codes = itself. 


The map x is the decoding map. 


Example 2 Here are some examples of y at work: 

. For all sets a, x(1,a) = a, and thus (0, {(1, a)}) = {a}. 
x(0,0) = 0. 

- x(0, {(0, ee {x( ee }. 

- x(0, {(0, Ø), (1, x) }) = {x(0, 0), x(1,2)} = {0,2}. 


. For all sets a and b, 


(0, {(0, {(1, a)}), (0, {(1, a), (1, b)})5) 


belongs to TV, and x applied to it is the ordered pair (a, b). 


DRAGA 


In all of these, we omit mention of a since it is the identity. 
We record the following application of Lemma [I] 
Proposition 5. Concerning x :TV > V: 


1. x- Nv = idy. 
2. X- Tv = ipy Px. 
3. x -Tx = [x] =x: uv. 


434 Lawrence S. Moss 


5 Uniformity 


As our title indicates, this paper is about notions of uniformity for functors on 
sets and classes. We propose a new definition in Section [5.1] below. Before that, 
we want to mention the previous notions of uniformity in the literature, and the 
motivation for them. 

The first place where some notion of “uniform functor” may be found is 
Aczel’s book [1] on non-wellfounded sets. His definition is in terms of the “ex- 
panded universe ... [which] has an atom x; for each pure set i.” In our termi- 
nology, this is exactly PT. (Recall that we are dropping the superscript, writing 
T for T?.) Were his definition to be translated into our notation, it would look 
similar to ours. It would involve for each class A a map 74 : HA — TA with 
some properties. However, the resulting 7 is not required to be a natural trans- 
formation (and indeed, it was not realized until several years later that T was 
even a functor, etc.). As a compensation, the definition requires another prop- 
erty on 7. Incidentally, I have not worked extensively with Aczel’s definition, but 
it seems to be hard to check that the uniform functors in his sense are closed 
under composition. 

We also emphasize that the first motivation for uniformity is to provide a 
sufficient condition on a monotone functor H that its greatest fixed point H* be 
a final H-coalgebra along with the identity as a structure map. 

The first work to formulate uniformity in terms of natural transformations 
is that of Turi [21] (also presented in Turi and Rutten [22]). Our definition is 
similar to theirs, and to distinguish the two we call theirs TR-uniformity. Its 
definition is in terms of a different monad on sets, the monad W given by WX 
is the least fixed point of X + PX + X. In addition, there is a unique morphism 
ey : WV — V such that the composition 


[Pev lipv,idv] 


„id 
WV py yeep y 





is ey. They require of a functor H that there be a natural transformation p : 
H — PW such that 





HV + ?WV 
ww | |oo 
V <—— PV 
PV 


(The use of PWV corresponds to our requirement that that natural transfor- 
mations involved in uniformity be ideal.) The main difference is that we use the 
monad T, a larger monad than W; hence more functors are uniform in our sense. 
(For example, the constant Ka functors whose value a are non-wellfounded sets 
are uniform in our sense but not in Turi and Rutten’s sense. Furthermore, func- 
tors built from Ka in the expected ways will also turn out to be uniform in our 
sense; see Theorem [7]) 
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Once again, it is worthwhile mentioning that their motivation for uniformity 
again is the same as Aczel’s. However, they recognize that there is also a different 
intuition, one related to substitution: 


Intuitively, an endofunctor on SET is uniform on maps [their termi- 
nology, following Aczel] if it is completely determined by is action on 
objects (i.e., classes). Most endofunctors are thus uniform on maps. For 
instance, consider the endofunctor X +> A x X mapping a class X to 
its product with a fixed class A. Given a function f : X — Y, the value 
of A x f at an element (a,x) of A x X is the pair (a, f(z)) € Ax Y 
which is obtained by applying f to the x € X in A x X. This suggests 
that the class X should be regarded as a class of variables and that, 
in general, the action of a functor F uniform on maps on a function f 
should simply be the substitution of the variables x occurring in FX by 
f(x). (Turi BPI] p. 211; also Turi and Rutten [22], Sec. 5.5.) 


For other approaches, see Devlin [II] and also Moss and Danner [19]. 

The upshot is that there are two intuitions at work in the definition of uni- 
formity, or at least two different goals. One is to search for condition on functors 
F which guarantees that the greatest fixed point F* of F be a final coalgebra 
with the identity as the structure map. I would like to emphasize, especially for 
readers with a background in category theory, that this kind of question is not 
“preserved under natural isomorphisms of functors”. The identity functor will 
never be uniform under any reasonable definition, but functors like 1 x x will 
turn out uniform under AFA. 

A second intuition is mentioned in the quoted paragraph above. We could 
say that this has to do with the class TV and way that set theory is used to 
represent natural mathematical operations, and also with the matter of coding 
sets by elements of TV. The overall thrust of set theory as a foundational study 
is that natural mathematical operations are representable in a first-order way 
in the universe of sets. It is not always easy to spell out what this means, and 
most textbooks never get around to it. What we are doing in the definition of 
uniformity is to spell out the representability of natural mathematical operations, 
but not in terms of first-order logic but in terms of the iterative monad of the 
power set. 


5.1 Our Definition 


We now come to the main definition in this paper. We continue to write T for 
the monad determined by the power set functor, omitting the superscript P in 
most places. We also remind the reader that an ideal natural transformation is 
one which factors through 7. 


Definition 3. A functor H is uniform if there is an ideal natural transformation 
m : H — T such that for all classes a, 


[ta] Ta = THa: 


We call n a uniformity for H. 
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Uniformity is equivalent to standardness plus the identity y-amy = inv. This 
says that if we encode HV as a subclass of TV and then collapse back to V via 
x, we have an inclusion. The reason why we want to do any encoding has to do 
with co-recursion: given e : a — Ha, we want to use get a solution satisfying 
an appropriate recursion principle. There is no evident way to do this without 
extra maps. We use 7 to get a related map e’ : a > T(a). Having this, we use 
Lemma B]to get a map a > V. 


Lemma 7. Let 7 : H — T be an ideal natural transformation. The following 
are equivalent: 


1. H is uniform. 
2. H is standard, and y- ny = ipv. 


Proof First, assume that 7 is a uniformity for H. Then in particular, X: ty = 
ipv. The interesting point is to check that H is standard. Let a C b. In the 
diagram below, 


Ha ———*> Ta 


Tia,b 
Hia,» 


iHa Hb = Tb Tia 
a 
r| Tip 


everything commutes except the region on the left: the top uses naturality of 
m; the triangle on the right is by applying T to the fact that ia = 2 - t¢,y; and 
uniformity is used in the overall outside and in the bottom square. So we see 
that iga = imb: Hia p. But now we notice a general fact: if x and y are any sets, 
and f : x — y is such that iz = iy : f, then x C y and f = iz y. It now follows 
that Ha C Hb and that Hia b = iHa, Hb, as desired. 

Going the other way, suppose H is standard, and x: my = ipv. Let a be any 
class. Return to the diagram above, and replace b by V. Then our assumption 
that x- ay = inv implies that the bottom square commutes, and the region 
on the left is by standardness. It follows that we have the desired uniformity 
equation [ta] - 7 = tna. 4 





TV 


The second formulation is often easier to check, since standardness is usually 
immediate for functors. We use Lemma[/| without further mention. 


Our main results The main results of this paper are as follows: the uniform 
functors contain the power set functor and the constants, and they are closed 
under a number of natural operations including composition and iteration. A 
uniform H has the property that H* together with the identity is a final H- 
coalgebra, and V together with the inclusion of HV into it is a cia for H. The 
same generally holds for \-uniform functors, a notion we introduce in Section [7] 
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except that the only constant functors which are A-uniform are those for sets in 
Ay. If H is X-uniform, then H* is a subset of Hy. 

The rest of this paper is devoted to proofs of these assertions, and some 
additional discussion. 


5.2 Examples and Closure Properties 


Example 3 We establish the uniformity of the power set functor P. This functor 
is easily standard. Let m : P — T be 7 -Pn from the iterative monad determined 
by P. Note that m is ideal. Furthermore, 


xT: Pav = ipv: Px- Pny 
= ipy- Pidy 
= ipy 


We used Proposition 5] 


Example 4 Let w be a set; we show that the constant functor Gy with value 
w is uniform. Let W be the transitive closure of w. Since w C P(w), we have 
an inclusion im,p(w). To shorten our notation, we abbreviate this as i in this 
example. We regard 7 as a natural transformation between constant functors. 
We also have a natural transformation Gg > PGr — PGy + Id. By a finality 
result concerning T in the functor category, we have a natural transformation 
To : Ga > T such that 1 = 7” - Pro- i. It follows easily from this that x -79(V) 
is the inclusion ig_y = tq. And the desired ideal natural transformation is 79 - j, 
where j is the inclusion iw w considered as a natural transformation. 


Example 5 The identity functor I is not uniform. Here are two ways to see this. 
First, we argue directly, by contradiction. Suppose we had an ideal 7 : I — T 
such that x -m = idy. Then for all sets a, ida = X + iT(a), T(V) ` Ta. In short, 
for all x € a, x = x(x). Let a = {0,1}. Then 7,(0) must be (0,0), as m is 
ideal, and x~1{0] = {(0,0), (1,0)}. Let f : a — a be the transposition f(0) = 1 
and f(1) = 0. By naturality, Ta - f = Tf - Ta. Applying this to 0, we see that 
Ta(1) = Tf(0,0) = (0,0). But then we would have 1 = x - 7a(1) = x(0,0) = 9; 
this is a contradiction. 

A less elementary way to establish the non-uniformity is to use a result from 
later that for uniform H, the greatest fixed point H* gives a final coalgebra with 
the identity map. For 1, we have J* = V. But the final coalgebras of I are the 
singleton sets. So for this reason, J is not uniform. 


Example 6 In contrast to this, the functor H(a) = a+0 is uniform; this is the 
same as 
H(a) = 1lxa = {(0,2):a € a}. 
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The natural transformation m : H — T is given by 


Ta(0, £) = (0, {(0, {(0,0)}), (0, {(0, 0), (1,2) })})- 


Similar to what we have seen in Example[2] part [5] for all sets x, y(y (0, 2)) = 


(0, x): 
x((0, {(0, {(0, 0) }), (0, {(0, 0), (1, ) }) })) 
=  {x(0, {(0,0)}), x(0, {(0,0), (1, £)})} 
= {{0}, {0,2}} (x) 


= (0, x) 


The calculations in the line marked (*) were performed in Example] Moreover, 
m is an ideal natural transformation because each 7,(0,2) is an ordered pair 
beginning with 0; more formally, consider 7* : H — PT given by 


m(0,0) = {(0,{(0,0)}), (0, {(0,0), (1, )})}. 


Then m = 7-7*. The most tedious part of the verification has to do with the 
naturality of 7*. Let f : a — b. Note that Tf(0,0) = (0,0), and for all z € a, 
Tf(1,x) = (1, fx). It follows that 


T f(0,{(0,0), (1,2) }) = {(0, PT F{(0,0), (1,2) fF = {(0, {(0,0), (1, fx)}}. 


Therefore 


m(1, fa) = {(0,{(0,0)}), (0, {(0, 0), (1, far) })} 
= {TF(0, {(0,0))}, TF, {(0,0), (1,2) })} 
= PT F{(O, {(0,0)), (0, {(0, 0), (1, x) HH} 
= PT f(x7(0,2)) 


The point is that we can “implement” the pairing machinery in a way which 
is recoverable by xy. In a similar fashion, we also have the following result: 


Theorem 7. If F and G are uniform, and a is any set, then the following 
functors are also uniform: F+ G, Fx G,14+ F, F°. 


Proof See [18] for many similar calculations involving the coding machinery. 
4 


Finally, we have the following proposition which shows that our notion of 
uniformity indeed generalizes TR-uniformity as defined in Section [5] This result 
is not needed for the rest of this paper, and the reader may omit it. We shall need 
a property of the monad W. The monad W also carries some extra structure. 
First of all, there is a natural transformation [y, 6] : PW + Id — W. P may be 
regarded as an endofunctor on the endofunctor category on classes; viz. F —> 
P. F. We also get a related functor P- -+ Id. For the functor W, the value of this 
functor at W is PW + Id. So the natural transformation [y, ô] : PW + Id > W 
may be regarded as a (P - _ + Id)-algebra structure for P. (For that matter, 
[7,7] : PT + Id — T is another algebra structure.) Moreover, (W, |7,6]) is an 
initial algebra of this functor P - _ + Id. By initiality, there is a unique natural 
transformation 6 : W — T giving an algebra morphism from [7, 6] to [7,7]. 
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Proposition 6. Every standard TR-uniform functor H is uniform in our sense. 


Proof Let p establish the TR-uniformity of H. Let 7 = 7 - P8 - p. Then 7 
is ideal. We are going to use Lemma [7] We must check that the outside of the 
figure below commutes: 














Ty 
P T t 
Hy sowy pry ry 
iHv Pey Py 
V : PV 
tPV 
XV 


The square on the left commutes by definition of TR-uniformity, the region at 
the top is the definition of 7, and the region at the right and bottom is by 
Proposition 5| part 2] For the triangle, we show that x- Gy = ey. To do this, 
consider the diagram below: 


P id P id 
pwy + V ory + v ZEY ove 


lv dv] [tv nv] lipv,idy] 








WV TV 


By x k 


The square on the left commutes by the definition of @ as an algebra morphism 
in the endofunctor category. The square on the right commutes by Proposition [5] 
Taken together, the two squares show that x - Gy satisfies the equation which 
uniquely defines ey. Hence x : Gy = ev, as desired. 4 


This result, together with our earlier remark about constant functors for 
non-wellfounded sets, shows that the standard TR-uniform functors are a proper 
subcollection of the uniform functors. 

5.3 Closure Under Composition and Iteration 


Theorem 8. If F and G are uniform, then FG is also uniform. 


Proof Let F be uniform via 7, and G uniform via p. To see that F’- G is 
uniform, let 7 * p = 77’: Fp, and consider the natural transformation 


H: (7 * p). 
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The following diagram shows 7 to be an ideal natural transformation: 


F T 
pes FT pp — 


Sd 


PTT Em PT 
H 


Here 7 is a natural transformation with the property that 7 = T - 7; this gives 
the the triangle above. The commutativity of the square is a part of Proposi- 
tion [] 


Consider next the following diagram: 


Fpv nTy 


FGV FTV TTV 


. Fx Tx 
UFGV,FV 
Tv 











iFGV 


The upper triangle is obtained by applying F to the uniformity equation for G 
and using the standardness of F as well. Everything else commutes easily. We 
now use Proposition [5] to calculate: 


Xenv TTy-Fpy = x-Tx-aIy- Foy = irav. 


This completes the proof that FG is uniform. =| 


Theorem 9. Let (T#,u",n") be a standard iterative monad which is free on 
a uniform functor H. Then TË is also uniform. 


Proof In this proof we have the monad of H and also the monad of P. As in 
our statement, we write the data coming from the first free completely iterative 
monad with the superscript H, and we continue our practice of dropping the 
superscripts on the free completely iterative monad of P. 

We know that TĦ is standard by Lemma [B] Let m : H — T be a uniformity 
for H. Let # : H — PT be such that m = 7-7. By the fundamental freeness 
theorem of Aczel et al. [2], there is a unique ideal monad morphism 7* : T¥ — T 
such that 7 = 1*- x. We check that x: rẹ = iray. 
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Consider the following diagram: 




















H 
ay i +V T. +V 
THY = envey TT 4 V TT y 
[Ty nv] 
“| = [enr 
TV TTV +V 
[uv nv] 
‘| [rov 
T 
V — V+V 
(4) 


We claim that both halves commute. The bottom uses Proposition[5] For the top, 
it is best to begin at HT“V + V and argue separately for the two components. 
The right component commutes due to the fact that 7* is a monad morphism; 
specifically, 7* - 74% = 7. The left component is more involved. We drop V and 
consider the following diagram in the endofunctor category: 


T us TH 
TH <—— HT —— PTT ——— TTY 
[ore 
Pu 
n PT = PTT Tr“ 
a Se 


T TT 





For the hexagonal region in the upper left, we appeal to Lemma 6.10 of [I6]. 
The region on the right commutes by naturality of r. The bottom square is an 
instance of Proposition [I] 


At this point we know that (4) commutes. We conclude that g = x- mj 
satisfies 


[v.V]-To- (mv +V) a} = g. 


By our Solution Lemma [2] there is exactly one g which satisfies this equation. 
We check that iray also satisfies it. We note that the diagram below commutes: 
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nry +V 
ri nv] H îruytV H TrHy tV H 
THY HT”#V +V PTT VV +V TTV +V 
ipHy re ie TipuytV 
V TV +V 


[x.V] 


In the topmost region, we have used the fact that 7 = 7-7. To see that the 
triangle on the left commutes, recall that af = [r# , n]! is the identity and 
that igray+vy = [tyruy,tv]. We have used the fact that m is uniform in the 
middle region, and on the right we have the definition of [iray]. 

This concludes the proof that x: rẹ = tpry. 4 


6 Consequences of Uniformity 


Theorem [0] below is an adaptation of the analogous result from [18], and ulti- 
mately the ideas come from Turi [21], following Aczel [I]. We remind the reader 
that AFA is needed in the results of this section. 


Theorem 10. Let H be uniform. Then (H*,id) is a final H-coalgebra, where 
H* is the greatest fixed point of H. 


Proof H is standard, so we may use Lemma [6] Since we are assuming AFA, 
Lemma [B] applies. Let e : b — Hb. There is a unique s = ft : b > V such that 
s = [s] - ma - e. Consider the following diagram. 


n| 
HV T? 


V z T?V 


S 





All the parts clearly commute except the left, and thus this does commute. This 
part shows that s = igy-Hs-e. For the uniqueness, note that s with our desired 
property determines a solution to 7% - e. 4 


Corollary 1. If H is uniform, then H generates a standard iterative monad by 
taking for each class a, Ta = (H + Ga)*, the greatest fixed point of H + Ga. 


Proof We know that for all sets a, H + Ga is uniform and standard. So the 
result follows from Theorem [10] 4 
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As shown in Milius [14], if H is any iteratable functor (on any category with 
+) and T and 7 are from its free completely iterative monad, then (TA,7Ta : 
HTA— TA) is always a cia for H. The next fact does not follow from Milius’ 
result. 


Theorem 11. If H is uniform, then (V,inv) is a cia for H. 


Proof The proof is virtually the same as that of Theorem [0] so we merely 
indicate the idea and exhibit the diagram. Let e : X — HX +V. Consider the 
diagram below: 








x— > HX+V Se TX+vV 
|as 
f HV +V T+V 
TV +V 
liav,V] 
T 
V = Viv 


The map f comes from Lemma P]applied to (mx + V)-e. The rest of the proof 
is the same. 4 


7 A Variation: A-Uniformity 


For each cardinal A, consider the functor P, giving the set of subsets of size less 
than A: by 
Pals) = {t Cs: |t] <A}. 


We have a natural transformation n, : Pà — P whose components are the 
evident inclusions. 


Proposition 7. (V,ip v) is a cia for Py. Specifically, given a flat equation 
morphism e: X + PAX +V, we have a flat equation morphism for P: ((ny)x + 
V)-e. The solution to these two are the same morphism. 


The natural transformation vy : Pà — T shows P) to be uniform, where 
Va = T - Pn- na. That is, it is ideal, and 


X(T Pnem)y = (x-(7T-Pn)v)-(m)v = tov-is,vev = ipv. 


We have used the calculation in Example [8] 

As a result, these functors P, generate standard iterative monads on the 
category of classes by taking greatest fixed points. Moreover, these functors have 
a property that P does not have: as functors on Set, they have final coalgebras. 
Indeed, the greatest fixed point of P) is the set Hy of sets x such that |tc(x)| < A, 
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where tc(z) is the transitive closure of xE] For A an infinite regular cardinal, this 
is the same thing as saying that |z| < À, and every y € tc(z) also has cardinality 
< Aà. The properties of this collection H) are sensitive to the underlying set 
theory. But assuming either the Foundation or Anti-Foundation Axioms, it is a 
set and not just a class. As a result, the functors P) determine standard iterative 
monads T on Set. We use a subscript À to indicate the data from this monad. 


Proposition 8. The inclusion iy, : Hy — V is a morphism of cias for Py. 


Using the freeness theorem of Aczel et al. [2], there is a unique ideal monad 
morphism Dy : Ty — T such that vy = Dy- Ta - Pn. All of the components of 7x 
are inclusions. 

We rework the results of Sections [4]and [5] by replacing P by P) throughout. 
The first step is to comment on the morphisms [f], associated to morphisms 
f : B — Hy). We define [f], : TB > Hy by a# -Ty f, where a# :T,Hy > Hy 
is the solution to ay : T > HaT + Hh in the cia (Ay, id). 


Proposition 9. For all sets B and all functions f : B — Hy, the diagram 
below commutes: 


TB jth Hy 


eo| |in 


TB Ta fl À 


Proof We consider the following diagram: 


Taf at 





TNB — ae. Hy (5) 
(TX )B (DX) Hy TV tH) 
e| ES 
TB ————TH, TV ~ V 








Tf Tin, F 


The leftmost two squares commutes by naturality. The morphism @ : DV — V 
is the solution of the flat equation morphism ay : DV —> PXD V + V. The 
square on the right takes an argument. Let G be the functor ar Pya-+ A). 
Then the greatest fixed point G* gives a final coalgebra with the identity as 
structure map. Recall Lemma|6] for G and ay :T,V — G(T) V). We check that 
both tx, - a® andô- Tyi H, are solutions of ay. The verifications are easy and 
we omit them. 


3 We do apologize for any notational confusion that could result from our use of H 
for a functor and to designate an operation from cardinals to sets. 
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The commutativity of the triangle also takes an argument. We show that 
a: (7)v has the property which uniquely defines â; that is, that it is a solution to 
ay. By Proposition[7] we only need to show that @-(7)v is a solution to TAV > 
PT,V + V. But this follows from the compositionality identity (see, e.g., [[4]) 
and the fact that (7)v reorganizes the flat morphism T,V — PT)\V + V to 
TV —PIV+YV. 

The commutativity of the outside of (5) implies the result of this lemma, in 
view of the definitions of [f], and fim, - f]. 4 


In the definition and results below, we recall that a standard functor H : 
Set — Set extends to a standard endofunctor on classes. We identify the two 
functors. 


Definition 4. H is \-uniform if H is uniform via some n : H — FP such that 
for all sets a, 


lta, AI) ‘Ta = iHa, Hy- 


Proposition 10. H is \-uniform iff there is some uniformity p: H — T which 
factors through Dx. 


Proof First, suppose that H is A uniform via m : H — Ty. Let p= DJ- T. 
Then the diagram below shows that for all sets a, [ia] © Pa = iHa- 


We used Proposition[9} In the other direction, suppose that p is a uniformity for 
H which factors as p = 7-7. Then the lower passage above is an inclusion. So 
since ip, is an inclusion, so is fia, m, |) ` Ta- 4 


This shows that if H is \-uniform, then H is uniform. And it is also easy to 
check that if A < k and H is A-uniform, then H is «-uniform. 

As we mentioned above, the results of this paper which we established for our 
notion of uniformity may be reworked for the refined versions of A-uniformity. 
For example, the version of Theorem [10] gives the following result. 


Proposition 11. If H is \-uniform, then H* C Hy). In particular, there is a 
final coalgebra for H which is a subset of the set of sets of hereditary cardinality 
<A. 


Turning to the closure properties of the collection of A-uniform functors, 
the main point is that then the constant functor with value w is \-uniform iff 
w € Hy. And we see that any functor built from constants w € Hy, Py, product, 
and coproduct has a final coalgebra which is a set and moreover is a subset of 
Ay. This final result gives an application of our work to the topic of bounded 
functors on Set. 
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8 Concluding Remarks 


The main point of this paper has been to rework the theory of uniformity using 
some of the machinery introduced in coalgebraic recursion theory in past years, 
including the notions of a completely iterative monad and a completely iterative 
algebra. As we have seen, there are two different intuitions at work, two different 
goals for the study. In a sense, one wants to find functors with the nice property 
that their greatest fixed points are final coalgebras, and then the technical details 
lead one to propose definitions that are about functors working by a general form 
of substitution. 

As it happens, the notions of uniformity that attempt to get at the intuition 
that a functor is determined “by substitution” in some sense single out a smaller 
class than those which give final coalgebras by considering greatest fixed points. 
The referee to this paper mentions the functor which maps each set into the set 
of all its finite multisets as an example. I shall work instead with the distribution 
functor D on sets given by D(X) is the set of all finite partial functions u from 
X to (0,1] such that Xsex u(x) = 1. (Equivalently, one may work with total 
function which whose value is 0 at all by finitely many points. However, this 
alternative would not define a monotone functor.) On morphisms, D works by 
marginalization (summing). The details are technical but it seems intuitively 
clear that D is not uniform in our sense (or under any definition stated in terms 
of natural transformations and maps like x). At the same time, it is the case 
that the greatest fixed point of D is a final coalgebra with the identity, and the 
universe is a cia for it with the inclusion. For D itself, this is easy to see as D* 
is a singleton x = {{(x,1)}}. Things are more interesting for variants such as 
H(x) = D(x) + A for a fixed set A. We show by example that (H*, id) is a 
final coalgebra, invoking Lemma [6] Let b = {w, x,y,z}, let a € A, and consider 
e :b— Hb given on the left below: 


e(w) = inl {(x,1/3),(y,1/3),(z,1/3)} fw) = (0,{(%,2/8), (z,1/3))} 
e(z) = inl {(x,1)} f(z) = (0,{(x,1)}) 

e(y) = inl {(y,1)} 

e(z) = ina f(z) = (l,a) 


To get the desired s : b > V, we must identify x and y (since they are bisimilar 
in e); this is the reason why uniformity in the sense of this paper fails). We do this 
in the system f. This system has a unique solution s*, by standard techniques. 
We then extend s* to the desired s by s(y) = s* (x). 

The results here extend to show that every functor built from D and the 
polynomial-forming operations (except of course for the identity functor) has 
the properties of interest in this paper. One can even imagine re-working the 
definition of uniformity in this paper to allow D and related functors to be 
uniform. However, doing this in an ad hoc manner gives no insight to help with 
a search for the most general uniformity notion. 
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Abstract. In recent years an increasing interest in regular sets for dif- 
ferent kinds of elements could be observed. The introduction of XML has 
led to investigations of regular sets of both ranked and unranked trees 
and also of attributed unranked trees. 

The aim of this short note is to introduce a uniform notion of regularity. 
If instantiated for strings, ranked trees and unranked trees it will coincide 
with the existing concepts and it can easily be extended to arbitrary data 
types. This leads to a natural notion of regularity for different kinds of 
attributed unranked trees and also to regular sets of structured elements 
which have not yet been investigated. The approach takes advantage 
from freeness constraints and parametric abstract data types as offered 
by the algebraic specification language CASL 


1 Introduction 


It is well known that strings and ranked trees can be interpreted as ground terms 
of suitable ranked signatures (alphabets). 

In the case of strings over a finite alphabet X the corresponding signature 
Qstring consists of a constant £ and a unary operation for each letter x € X. For 
each Qstring—algebra A there exists a unique homomorphism fa : T(Qs¢tring) > 
A where T(Qstring) denotes the free term algebra for the signature Qsiring. For 
each term t = €%1...%, the homomorphism fa maps t to the evaluation of t in 
A. Now it is folklore that a subset L C T(Qstring) is regular if and only if there 
is a finite Rstring-algebra A with a distinguished subset Ap C A such that 


L= J, (4o). 


This means that L is the homomorphic inverse image of an accepting set Ao of 
states of the finite automaton A. 

It has been shown by Thatcher at that a corresponding characterization holds 
for regular sets of ranked trees. 

It turns out that for the intended generalization it will be more convenient to 
work with partial algebras and weak homomorphisms between partial algebras. 

In order to define unranked trees and other types of structured data, we 
will work with more general algebraic structures as usually used in Universal 
Algebra. The generalizations concern the domains of fundamental operations. In 
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a many-—sorted framework the domains of fundamental operations are assumed 
to be products of sorts and the codomain to be one of the given sorts. We will 
allow that the domain of a fundamental operation can be an abstract data type 
on the given sorts. Such structures arise naturally if one works with many-—sorted 
algebras and freeness constraints which are basic in the algebraic specification 
language CASL . 


One example for this more general concept is given by list-algebras. For list- 
algebras the domain of a fundamental operation may be the set of all finite lists 
with elements out of the carrier set of the list—algebra. 


Let us have a first look at the simplest case of list-algebras. We consider the 
signature with just one sort s and one operation symbol of type c : s_lists — 
s. What about the ground terms generated by that signature? Since there 
is the empty list, denoted by [|], there is the ground term c([]). This ground 
term can be used to build for instance the list [e([]), c([]), c([])] consisting of 
three copies of the previously constructed ground term. This list yields a new 
ground term c({c({]), e([]), e({])]). Similarly one could construct c([c({]), c({])]) and 
c({e({]), e(fe({]), ce([)]), e([])]) and so on. Evidently, the ground terms represent the 
construction of list of list of ...list of the empty set which may also be be seen as 
finite unranked ordered trees. This corresponds to the well known specification 
of finite unranked ordered trees as an inductively defined data type. 


It is well known that regular sets of finite unranked ordered trees can also 
be characterized as inverse homomorphic images. But, taking only finite list— 
algebras would lead to a more general concept. For the characterization of regular 
sets of finite unranked trees one has to use an additional property which leads to 
so-called regular list—algebras, where a list—algebra is called regular, if the set of 
lists (sequences) of elements mapped by the fundamental operation to one and 
the same element is always a regular set of lists (sequences). A finite regular list- 
algebra is a finite deterministic bottom-up tree automaton in the terminology 
of [I]. Therefore, in the following we will also speak of states if we talk about 
elements of finite regular algebras. 


This encourages us to call a subset of an inductively defined data type regular 
if it is the inverse homomorphic image of an accepting subset of a finite regular 
algebra (of a corresponding generalized type). In this way, the definition and 
investigation of regular expressions can very generally be based on operations 
on classes of finite regular algebras. This leads to a uniform view of regular 
expressions for different types of structured data. 


It is worth mentioning that finite regular algebras coincide with finite algebras 
in the case of traditional algebras for ranked alphabets. 


This short note may be seen as a straightforward generalization of J.W. 
Thatcher’s work on tree automata. This generalization does not include regular- 
ity for sets of infinite data structures like streams or infinite lists. 
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2 Algebraic Operations with Structured Domains 


Traditional algebraic structures use on the meta level only the type constructors 
of Cartesian products. Algebraic structures in the framework of category theory 
use for typing arbitrary endofunctors T : Set — Set and define an algebra as 
a pair (A,a : T(A) — A). In the case of many-sorted algebras one has to use 
endofunctors T : Set® — Set” with a finite set S of sort names. 

In this note we will work with endofunctors that can be built up by finite 
Cartesian products and by generic free data types in the sense of the algebraic 
specification language Cas. [4]. This requirement rules out for instance the 
powerset functor, but it allows the powerset functor P.,(-_) of finite subsets. This 
means that we use ideas and concepts which came up very early within the 
theory of abstract data type, see for instance [2], [8], [5], [6] and [7]. 

The following are some examples of extended signatures for these more gen- 
eral algebras: 


SIG ListAlgebras IS 

SORTS s 

OPS c : s_lists ---> s 
END 


SIG AttributediListAlgebras IS 
SORTS sil, s2 
OPS c : s2 x si_lists ---> si 
END 


SIG SetAlgebras IS 

SORT s 

OPS c : s_sets --->s 
END 


SIG Attributed2ListAlgebras IS 

SORTS s1, s2 

OPS c : s1_lists x s2_lists ---> sl 
END 


An algebra A = (Agi, As2;ca) for the signature Attributed2ListAlgebra 
is then given by an arbitrary set Ası, the interpretation of the sort name s1, 
a second set As2, the interpretation of the sort name s2, and a mapping ca : 
Až x Ai, — Agi. Accordingly an algebra A = (As; ca : Pu(As) —> As) for the 
extended signature SetAlgebras is given by the interpretation of the sort name 
s and the interpretation of the operation symbol c which assigns to each finite 
subset of the carrier set an element of the carrier set. 

Since we want to define regular subsets of intial algebras of extended sig- 
natures, we will first have a look at the intial algebras of the given extended 
signatures. 
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As described in the introduction the initial algebra of the extended signa- 
ture ListAlgebras represents finite ordered unranked trees. However, the initial 
algebra of the signature AttributediListAlgebras is given by the empty set 
for both sort names. But, the intended meaning of the signature is the set of 
finite ordered unranked trees whose nodes are labelled with elements out of the 
interpretation of s2. By the same reason the initial algebra of the signature 
Attributed2ListAlgebras differs from the intended meaning. In that case the 
nodes should be labelled with lists of elements out of the interpretation of the 
sort name s2. In both cases the difference is caused by the fact that the initial 
algebras interpret the sort name s2 by the empty set. 

Finally, we see that the initial algebra of the signature SetAlgebras repre- 
sents finite unordered and repetition free trees. 

The problems described above can be solved by the use of parameteric ex- 
tended signatures. If one uses the sort name s2 as a parameter then each inter- 
pretation of this sort name produces an instantiated extended signature. Now, 
for each nonempty interpretation of the sort parameter the initial algebras of 
the instantiated extended signatures represent the intended meaning. 

Parameterized extended signatures are just syntactic sugar for the represen- 
tation of families of extended signatures. If one interprets the sort parameter s 
by a set M then the instantiated extended signature results by adding s as a 
sort symbol each element of M as a constant operation symbolm : s, and one 
has to fix the interpretation of the sort name s to the set M. 

In the following we will work with one instance of the parametric version of 
Attributed2ListAlgebra where the parameter sort s2 is instantiated by the 
alphabet {a,b, A, B}. The resulting extended signature is 


SIG ALAlg IS 

SORTS s, 

OPS c : fa,b,A,B}* x s_lists ---> s 
END 


3 Regular Subsets of Initial Extended Algebras 


In the case of strings and finite ranked trees regular subsets can be characterized 
as inverse homomorphic images of subsets of finite algebras. The example of finite 
ordered unranked trees shows that in general finite algebras are not sufficient to 
characterize regular subsets by inverse homomorphic images. More specific finite 
algebras are needed. 

Definition 3.1. Let Sig be an extended signature such that for all type 
constructors used in definitions of the types of domains of the operation symbols 
the notion of regular subsets is known. Let A be a finite algebra for the given 
signature. The finite algebra is called regular if for each operation and each 
element of a carrier the inverse image of that element is a regular subset of the 
domain. 
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If the used type constructors of a signature preserve finite sets, then evidently 
each finite algebra is regular. Therefor, the concept of regularity of finite algebras 
is not needed in case of regular strings (lists) or finite ranked trees. 

Definition 3.2. Let Sig be an extended signature such that for all type 
constructors used in definitions of the types of domains of the operation symbols 
the notion of regular subsets is known. A subset of the initial Sig—algebra T(Sig) 
is called regular if it is the inverse homomorphic image of a subset of a finite 
regular Sig-algebra. 

The extended signatures above use only products and lists as type construc- 
tors. Therefore, this definition can be used to define regular sets for the defined 
types of trees. 

In a next step trees could be used as typ constructors in order to define 
other structured data types. Definition 3.2 provides then notions of regular sets 
for the resulting structured data types. Since each interesting data type can be 
specified by freeness constraints using only finitely many auxiliary data types, 
also defined by freeness constraints, Definition 3.2 allows to define the concept 
of regular sets for all interesting data type, using a suitable hierarchy of type 
definitions by freeness constraints. 

Since Attributed1ListAlgebras—algebras are deterministic bottom—up au- 
tomata as introduced in [I], the notion of regular sets for the extended signature 
AttributediListAlgebras coincides with the notion of tree regular languages 
according Definition 2.14 in [I]. 

Let us apply Definition 3.2 to the extended signature ALA1g. A finite regular 
ALAlg-algebra A = (As; cą) assigns to pair of a finite list of elements out of As 
and a finite list of elements out of {a,b, A, B} an element in As. 

The unique homomorphism from the initial ALAlg—algebra to a specific finite 
regular ALA1lg—algebra 

A = (Asg,;ca) 


defines a classification of the trees where each class is given by the inverse ho- 
momorphic image of an element in As. The basic operation cą assigns to each 
class a regular set over {a, b, A, B} whose elements can be used as attributes for 
the root of trees out of the corresponding class. 

We will illustrate this by an example: 
Example 3.1: Let E = (Es; cg) be given by: 














Es = {80, 81, 82, 83, Sa} 
for (w,l) € ({a,b, A, B}* x Aš): 





so if w € L(a*) Le {nil} 

sı if w € L(b*) 1 € {8050, 808080} 
ce(w,l) = $ so if w € L((A + B)a*) Le {s1s1, 8981} 

s3 if w € L((A + B)b*) Le {s28982, 15283} 

Sd else else 


and let us assume that {s3} C Es is the set of accepting states. 
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Then we have four interesting classes of attributed trees, defined by the 
inverse homomorphic images of so, $1, S2, $3 respectively, and the complement of 
the union of this classes, represented by sq. 

so represents the class of all trees with exactly one node labelled with a string 
w E€ L(a*). 

sı represents the class of trees with two or three sons, classified by sọ and 
the root is labelled with a string w € L(b*). 

S2 represents the class of trees with two sons such that the first is classified 
by sı or sg and the second son is classified by sı. Finally the root is labelled 
with a string L((A + B)a*). 

Finally, the accepting state s3 represents the class of trees with exactly three 
sons, where either all of them are classified by s2 or the first son is classified by 
8 , the second by s2 and the third again by s3. The root is labelled by a string 
w € L((A + B)b*). 

The example shows that the explicit definition of the basic operation of a 
finite regular algebra for the extended signature ALAlg has great similarity with 
a tree grammar, where the elements act as meta variables. 

The example shows also another aspect, it is basically a partial ALA1lg—algebra 
with a one-point completion given by the state są. In terms of automata this 
completion point is a trap. If the computation once reaches the trap it can never 
leave it. 


4 Regular Expressions Based on Colimits 


The investigation of regular expressions can now be based on colimits in suitable 
categories. The well known interpretation of the operations of regular expression 
as operations on finite automata can now be extended to operations on finite 
regular extended algebras. 

We will illustrate this by means of the category of finite partial regular ALAlg— 
algebras as objects and weak homomorphisms as morphisms. 

Weak homomorphisms preserve the applicability of the basic operations but 
do not necessarily reflect this property. To be more precise, let A,B be partial 
algebras. Then ca : {a,b, A, B}* x At =? A, and cp: {a,b, A, B}* x Bš =? B, 
are partial mappings. A total mapping f : As — B, is a weak homomorphism 
if for each (x,y) E€ dom(ca) the pair (x, f*(y)) E€ dom(cg) and f(ca(z,y)) = 
ca(x, f*(y)), where f* : A? — Bš is the canonic extension to lists. 

From category theory it is known that arbitrary finite colimits exist if sum 
and coequalizer exist [3]. Therefore, the most interesting constructions on alge- 
bras are summation and quotient construction. 

Before we study this construction in detail we introduce a notation. For a 
given finite partial regular ALAlg—algebra A and a given subset X C A, of 
accepting states L(A, X) denotes the regular set of those ground terms for the 
extended signature ALA1g which can be evaluated in A to a value out of X. To 
be more formal, if A’ denotes the one-point completion of A and 














fat : T(ALA1g) — At 
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the unique homomorphism then we define 


eval(A) = {t € T(ALAlg)| fat(t) € As}, 
L(A, X) = {t € T(ALAlg)| falt) €X C As}. 


With this notation it is easy to see that the empty set of ground terms and 
the set of all ground terms are both regular. For the empty set one takes an 
algebra A = (As; ca) where there is no w €{a,b,A,B}« with (w, nil) € dom(ca) 
and for the second case one takes the total one-element algebra where the only 
element is also an accepting one. 

For a finite partial regular algebra A = (As; ca) we call an element ca(w, nil), 
if it exists, an initial state of A. 

In the following we describe the sum of two algebras A, B. First we take the 
disjoint unions A, + Bs. Let ing : A, > As + Bs,ing : Bs —> As + B, be the 
injections. A, + Bs becomes the carrier of A + B. The basic operation ca+g is 
defined as follows 





























ca(w,l) if l= in (l) and (w,l’) € dom(ca) 
ca+B(w,l) = ¢ ca(w,l) if L= ing (l’) and (w,l’) € dom(cg) 
undefined else 


It is a matter of routine to show that this construction gives a sum in the 
category of finite partial regular ALA1lg—algebras with weak homomorphisms. 

Summation can be used to show that the union of two regular subsets is again 
regular. If X C A,,Y C B, are given sets of accepting states in A, B respectively 
and X wY denotes the union of their embeddings in A+ B then 

















L(A, X)UL(B,Y) = L(A+B,X WY). 











That regular subsets in T'(ALA1g) are closed under intersection can easily be 
seen by means of the Cartesian product of finite partial regular algebras: 








L(A, X) N L(B,Y) = L(A x B, X x Y). 




















It is even simpler to see that the complement of a regular subset is regular. 
One takes just the complement of the accepting subset on states in a finite 
(total) regular algebra which represents the given regular subset and one gets 
the wanted algebra for the complement. 

Above we have seen that the sum of algebras represents the union of regular 
sets and the sum of regular expressions. What about the composition and the 
star—operation of regular expressions. The semantics of these operations can be 
reduced to quotient construction of finite partial regular algebras. It is sufficient 
to define how states can be fused together. 

Let A = (A,;ca) be a given partial algebra and x,y E€ A, two elements. We 
define the quotient algebra A**”Y which results from A by fusing x with y as 
follows. 

Definition 4.1: An equivalence relation R C A, x As in the carrier set 
of a partial algebra A is called a congruence if for all w € {a,b, A, B}* 
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and all (a1,a,) E€ R,...,(an,a/,) E Ryn > 0 if ca(w,[ai,...,an]) = a and 
ca(w,[a,,...,a/,]) =a’ then (a,a’) € R. 

Definition 4.2: For a given partial ALAlg—algebra A and a congruence R in 
A the quotient A/R has the quotient set A,/R as carrier and 


Ca/R(w, [(a1)R,---, (Gn) R]) = (a)r 
if there are representatives a; € (a;)r for i € {1,...,n} with 
ca(w, [a},...,a,]) =a 


where (a) p denotes the congruence class containing zx. 

Since congruences in the sense of Definition 4.1 are closed under intersections 
there exists for each set X of pairs the smallest congruence containing X which 
is denoted by Rx. Rx is also called the congruence generated by X. 

The algebra A***¥ can now be defined by 


APY = A/Ri(z,y)}- 


Iterated application of this construction leads to the identification of a finite 
set {21,...,2,} of states with the state y or of identifying zı with y1 ... £n with 
Yn. The resulting quotient algebra will be denoted by 


Altietr}ey and AITU Eney} 


respectively. 

For a given ALAlg-algebra A, a set X C A, of accepting states and a congru- 
ence relation R the set of accepting states in A/R is given by {(x)r|a E€ X}. 

By means of the introduced quotient construction on partial ALAlg-algebras 
one can define a construction on algebras which corresponds to the *—operation of 
regular expressions. The corresponding algebra A* can be constructed as quotient 
of the congruence relation which is generated by fusing each initial state with 
each accepting state. 

With respect to Example 3.1 the algebra E has one initial state sọ and one 
accepting state s3. This implies 
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The quotient construction together with the sum can be used to define an 
operation on algebras which corresponds to the product of regular expressions 
or the sequential composition of automata. The basic idea for the construction 
of A - B is to define first a sum A+ (B-+...B) which contains as many copies of 
as A has accepting states and fuse each accepting state with the initial states 
of the copy of B which corresponds to the accepting state. 

There is one problem left. Which algebras correspond to the atomic regular 
expressions? The corresponding concept of an atomic partial algebra depends on 
the given extended signature. We will discuss the case of atomic partial ALAlg— 
algebras. 
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Let be given a finite set {a9,@1,...,@n},0 < n, and regular sets Lı C 
{a,b, A, B}*, Le C {a1,...,an}*. Then we call a partial ALAlg—algebra A with 
the carrier set {ao,@1,...,@n} atomic if dom(ca) = Lı x L2 and ca(w,!l) = ao 


for all w € L4,1 € Lə. 

Finally we demonstrate that the partial ALAlg-algebra which results from 
Example 3.1 by removing the trap sg can be constructed out of atomic partial 
ALAlg—algebras using summation and quotient construction. 

We start with the following atomic algebras: 


1. A with A = {ao}, Li (A) = L(a*), L2(A) = {nil}; 


























2. Bwith B= {bo, br}, Li( B) = L(b*), Lo(A) = {b1b1, b1b1b1}; 

3. C with C = {c0, c1, co}, L1 (C) = L((A + B)a*), La(C) = {c2c2, cic2}; 

4. with D = {do, dı, də, da}, Lı (D) = L((A+B)b*), Lə(D) = {dodzdz2, didzd3}; 
Then 


(A +B4+C+ pete) feo shi bomen oer comoda, boned} 


is a representation (up to isomorphism) of the partial ALAlg—algebra from Ex- 
ample 3.1. 

Based on this example it is not hard to see that each finite partial regular 
ALAlg—algebra can be constructed out of atomic partial ALA1lg—algebras using 
summations and quotient constructions. 


5 Conclusions 


We have introduced a uniform notion of regular sets which is applicable to ar- 
bitrary structured data types and which coincides with the existing notions of 
regular sets of words, ranked and unranked trees. The introduced notion is based 
on the observation that regular sets can be seen as subsets of free data types 
which are inverse homomorphic images of finite regular algebras (for suitable 
signatures). 

In this paper we have sketched by a representative example a purely algebraic 
approach to the notion of regular sets for arbitrary data type. By a suitable 
extension of the signatures of algebras the approach of Thatcher [14] to ranked 
trees could be generalized to arbitrary structured data types. This approach can 
be seen as an addition to approaches based on logics, see for instance [II]and 
[12]. 

Algebras without rank have earlier also been used by Indermark [9]. But 
Indermak uses unranked algebras in order to avoid many-sortedness. 

The most closely related work is that of K. Hashiguchi, Y. Wada and S. 
Jimbo which use classical algebraic structures. They introduce binoids which 
have two associative binary operations and an identity to each operation. One 
operation is used to represent the depth and the other to represent the width of 
trees. The formal framework of this approach becomes rather complicated, since 
a two-sorted structure has to be encrypted in a one-sorted structure and the 
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unrestricted width and depth of unranked trees in two binary operations. This 
approach does not offer a framework which can easily be extended to arbitrary 
structured data types. 

It remains for future work to give a complete formal presentation of the 
sketched approach. Additionally it would be interesting to know if the coalge- 
braic approach to regular expressions developed by J.J.M.M. Rutten [13] can be 
extended to the more general situation of regular sets of arbitrary structured 
data. 
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Abstract. From the range of techniques available for algebraic speci- 
fications we select a core set of features which we define to be the ele- 
mentary algebraic specifications. These include equational specifications 
with hidden functions and sorts and initial algebra semantics. We give 
an elementary equational specification of the field operations and conju- 
gation operator on the rational complex numbers Q(z) and discuss some 
open problems. 


For Joseph Goguen 


1 Introduction 


Joseph Goguen has a vision for the theory of computation. It is algebraic, it is 
comprehensive, and it is focussed on the world’s work. He uses a set of mathemat- 
ical tools from category theory and universal algebra to explore a vast landscape 
of fundamental concepts, system architectures, emerging technologies, and con- 
temporary practices in software development. He is a great explorer. His achieve- 
ment is a fine example of just how much intellectual ground can be covered in the 
life time of a brilliant computer scientist with energy, curiosity, technical insight 
and a personal scientific agenda. He has reflected on his work on computing up 
to 1999 in Goguen [15]. There is so much to think about in this oeuvre. 

One early line of thought is the role of initial algebras in semantic modelling 
and specification, expounded in [19]. Our own work in algebraic specifications 
from 1979 onwards owes a great debt to Joseph Goguen and his colleagues Jim 
Thatcher and Eric Wagner who, writing as the ADJ Group, provided a perfect 
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mathematical basis for modelling and specifying abstract data types, starting 
n [20]. The ADJ Group established most of the basic theory by combining the 
technical ideas of many sorted algebras, equations, conditional equations, hidden 
functions and sorts, term rewriting and initial algebras. We continue to use many 
of their notations and techniques. Thanks to Sam Kamin [22], a rapidly growing 
literature was organised and problems identified and clearly stated, such as when 
were hidden functions and sorts necessary? Some of the more difficult problems 
of partiality, errors and parameterization also showed themselves in their early 
writing, problems which, after many long papers and books, are still not quite 
under control. Joseph Goguen and Eric Wagner have reflected on the ADJ Group 
in [13] and [38], respectively. 

Our general theory of the algebraic specification of computable data types 
analysed the relationship between computability of abstract data types and equa- 
tional specifications. Between 1979 and 1995 we published a series of papers that 
classified the computable, semicomputable and cosemicomputable data types us- 
ing algebraic specifications (see, for example, Bergstra and Tucker ). We 
proved several theorems that show that all computable data types have specifi- 
cations that are very simple and small, or have good term rewriting properties, 
but always with hidden functions. Work has continued on this subject, refining 
notions such as finality (e.g., including Meseguer and Goguen [16] and Moss, 
Meseguer and Goguen, [31]), and on open questions (e.g., by Marongiu and 
Tulipani [26] and, most recently, by Khoussainov [23]24]). 

We have returned to the foundations of the subject in [7], tackling the speci- 
fication of basic data types such as the rational numbers, and we continue here. 
First, we will select a core set of features which we define to be the elementary 
algebraic specifications. These are close to the basic techniques of the ADJ Group 
of the 1970s. 

The set Q of rational numbers is a number system designed to denote mea- 
surements. They are used to define the real and complex numbers via approxi- 
mation. The rationals are the numbers with which we make finite computations. 
Algebras made by equipping Q with some constants and operations we call ratio- 
nal arithmetics. We usually calculate with the algebra (Q|0,1,+,—,-,~1) which 
is called the field of rational numbers when the operations satisfy certain stan- 
dard axioms. 

In addition to rational arithmetics, of particular interest are field extensions 
of the rational number field. Through Galois Theory, field extensions play a 
fundamental role in our understanding of the algebra of numbers, including the 
theory of equations ({12]34]). One important field extension is the field of rational 
complex numbers, based on the set 


Q(t) = {p + i - qlp,q € Qh. 


This has special operations such as complex conjugation cc(p+%-q) =p—i-@q. 

The algebras of rational numbers, such as the field and its extensions by real 
and complex numbers, are among the truly fundamental data types. Despite the 
fact they have been known and used for over two millennia, they are neglected in 
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the modern theory of data types. After over 30 years of data type theory, many 
questions about rational arithmetics and their extensions are open. 

Now the common rational arithmetics and field extensions are all computable 
algebras. Indeed, in the theory of computable rings and fields there is a wealth 
of constructions of computable algebras that start with the rationals and the fi- 
nite fields: see the introduction and survey Stoltenberg-Hansen and Tucker [36]. 
Therefore, according to our general theory of algebraic specifications of com- 
putable data types they have various equational specifications under initial and 
final algebra semantics. Computable algebras even have equational specifications 
that are also complete term rewriting systems ({5]). However, these general spec- 
ification theorems for computable data types involve hidden functions and are 
based on equationally definable enumerations of data. 

Recently, in Moss |80|, algebraic specifications of the rationals were consid- 
ered. Among several interesting observations, Moss showed that there exists an 
equational specification of the ring of rationals (i.e., without division) with just 
one unary hidden function. He used a special enumeration technique based on 
a remarkable enumeration theorem for the rationals in Calkin and Wilf [I0]. He 
also gave specifications of other rational arithmetics and asked if hidden functions 
were necessary. In [7| we proved that there exists a finite equational specification 
under initial algebra semantics, without hidden functions, of the field of rational 
numbers. The pursuit of this result leads to a thorough axiomatic examination 
of the divisibility operator, in which some interesting new axioms and models 
were discovered, and related results on fields and other rational arithmetics. 

In particular, here we prove: 


Theorem 1. There exists a finite equational specification under initial algebra 
semantics, and without hidden functions, of the algebra 


(Q(#)|0, 1, a, +7; ’ cc) 
of rational complex numbers with field and conjugate operations that are all total. 


The structure of the paper is this. In Section [2| we discuss the basics of 
specification theory and define the elementary algebraic specifications. In Section 
B] we describe the algebras and the axioms we will use to specify them. In 
Section [4]we prove the main theorem. Finally, in Section B] we discuss some open 
problems. 

This paper, and our [7[8J9], can be read independently but they are better 
viewed as a sequel to Bergstra and Tucker [45], which contain many comple- 
mentary results. 


2 Elementary Algebraic Specifications 


Since the first examples of algebraic specifications of data types in the 1970s, 
there has been a steady growth in the features that one may add to the basic 
techniques to be found in the early ADJ papers such as [20]. The new techniques 
have been introduced for a number of obvious reasons: they have been found to 
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be natural, or useful, or necessary to solve problems, or they have been used to 
extend or explore simpler techniques. The development of languages and tools 
(such as OBJ, ASF-SDF, Maude, CASL, etc.) for algebraic specification has 
increased the number and complexity of features in use. 

So just what are the basic elements of this subject? 


2.1 What Are the Elementary Algebraic Specifications? 


Algebraic specification starts with the idea of modelling - e.g., data, processes, 
syntax, hardware, etc. - using sets and functions. Wherever there are sets and 
functions there are algebras! For example, the sets X,Y and function f: X > Y 
are combined to form the many sorted algebra (X,Y|f). A particular algebra 
A is a mathematical model of a specific concrete representation of the system 
equipped with concrete operations. The need to understand the system, its rep- 
resentations and the extent to which they are unique leads to the concepts of (i) 
axiomatic theories for the chosen operators, and (ii) homomorphisms and iso- 
morphisms for the comparison of algebras. The simplest axioms are equations. 
The simplest deductions are are those of equational logic based on the rewriting 
of terms. Any system can be modelled in this way. Therefore, we define the basic 
elements as follows. 


Definition 1. An algebraic specification (X', E') of a X algebra A is elementary 
if it involves only 


1. A many sorted signature X' that is non-void. A signature is non-void if there 
is a closed term of every sort. 

2. A set E' of equations or conditional equations. 

3. An initial algebra semantics such that I(X', E’)|y S A. 


In particular, the elementary specifications require total functions, allow hidden 
functions and sorts, and may or may not be complete term rewriting systems. 
Clearly, there are plenty of restrictions in force: see 

A standard way of proving an elementary specification is to check these 
properties: 


Definition 2. An algebraic specification (X',E') of a X algebra A satisfies 
Goguen’s conditions if it the following are true: 


No Junk or Minimality The algebra A is 3’-minimal. 
No Confusion or Completeness For all closed X terms t,t’, we have 


Akt=t? if, and only if, E’ t=. 





In particular, the Goguen conditions imply that 


1(5', EB) |p & A. 
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What makes these features elementary? 

The purpose of developing a specification is to model, analyse and under- 
stand. In simple terms, these algebraic tools are fundamental for any modelling 
using sets and functions: they are used to abstract and analyse the properties of 
an idea, component, or system. One chooses a set of operators and postulates a 
set of laws they satisfy; the laws are expressed as equations or conditional equa- 
tions. The terms express all possible operations that can be derived by combining 
operations, and the equational identities express the consequent facts about the 
model. The term rewriting is a completely basic mechanism for both abstract 
reasoning and computation. This view suggests the elementary character of the 
equations and that we cannot make do with less. There is also an argument that 
they need extension in special circumstances. 

Now, the whole modelling and specification process for elementary specifica- 
tions is mathematically robust in the sense that the syntax and semantics have 
virtually no special conditions, neither subtle or obvious. 

In modelling using an elementary algebraic specification one simply starts 
playing with operators, equations and rewrites. There are no side conditions, 
side effects, and semantic errors to beware. The elementary algebraic specifi- 
cations work simply in all cases. The only mistakes possible are mistakes in 
understanding what one is trying to model. 

In our algebraic theory of computable data types, there are many results 
that show that if a many sorted algebra can be implemented on a computer 
then it possesses a range of elementary equational specifications with remarkable 
properties. Technically, all computable algebras can be specified with hidden 
functions, and all semicomputable algebras can be specified with hidden sorts 
and functions. In general this is the best possible. One theorem provides complete 
term rewriting systems ({5], Terese [37]). We need not worry about their power 
because: 

The elementary algebraic specifications can specify everything that can be 
implemented on a computer in principle. 


2.2 What Are the Non-elementary Algebraic Specifications? 


What features have we excluded from the Definition []and hence have “declared” 
to be not elementary, and why? 

We have excluded final algebra semantics because final algebras of equational 
specifications do not always exist and there are different interpretations possible 
(see Moss, Meseguer and Goguen [31]). 

We have excluded loose semantics because we are focussed on specifying 
algebras up to isomorphism rather than classes of possible models. 

We have excluded the following, too: 


— Generalisations of equations One can use first order formulae that are 
“close” to equations such as Horn formulae. Since we exclude relations the 
Horn clauses are excluded. The multi-equations studied by Adamek et. al. 
[I] are also not simple enough. 
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— Partial functions Partiality is an essential aspect of computation. However, 
their logic is awfully complicated. Total functions are not without problems 
when specifying the stack, as we have seen in our [6]. However, as we showed 
in [7], it is not a problem to use total functions to specify the inverse on the 
rationals. 

— Errors and exceptions The addition of new types of data, such as error 
flags, to familiar old friends, such as the natural numbers equipped with the 
predecessor n— 1, leads to difficult specifications and semantic complications. 

— Subsorts and order sorted algebras Subsorts occur naturally and help 
with modelling subtyping, errors, etc. However, there are different theories 
none of which are simple: see, for example, Mosses on unified algebras [32] 
or the survey [I8]. 

— Higher order The higher-order theory is complicated from the start though 
it does possess a nice generalisation of the standard theory (see Meinke [27]). 

— Empty sorts Empty sorts are tricky: see Goguen and Meseguer [I7]. 

— Priorities Priorities for the equational rules are technically natural in de- 
veloping software tools for algebraic specifications. However, they lead to 
complications since their term algebra representations do not satisfy the 
equations in general and must be considered pre-initial in some sense. 

— Modularity Our elementary specifications are flat and do not have imports. 
Even the most simple notion of import introduces involved operations for 
flattening, see Rees et al [33]. 

— Parameterization There are many alternate treatments of parameteriza- 
tion, none of them simple. 


Many of the features and techniques above that we have declared to be not 
elementary we certainly consider important. For example, features such as par- 
tiality and higher order equations are semantically fascinating and challenging 
to study, and are necessary to meet desires for certain kinds of specifications. 
However, they are not elementary. 

Some of the festures we have chosen for exclusion, such as subsorts, may seem 
less complicated to the user: they are not. For example, consider distinguishing 
the set Qzo of non-zero elements of Q using a subsort of a signature for the 
field of rationals: let nzrat be the subsort of the sort rat. What is the type of 
the rational function 1/(1+x.x)? Is it nzrat and, if so, why? This kind of typing 
problem is complicated for if it were decidable then the diophantine problem over 
Q would be decidable - this remains an important open problem in computability 
theory. The types of open terms are problematic, and so are types of equations. 
Is the equation (1 + x.r)/(1 + x.x) = 1 usable as an axiom? If so, what does 
that imply about its type, or should that be given explicitly. But what can the 
type be: taking type nzrat represents the axiom that this denominator is never 
0 which to prove may require this very axiom, taking as type rat may be a type 
error. 

We believe that none of the features in our list are elementary for users, and 
that combining them leads to significant complications. 
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2.3 Technical Preliminaries on Algebraic Specifications 


We assume the reader is familiar with using equations and conditional equations 
and initial algebra semantics to specify data types. Some accounts of this are: 
ADJ [20], Meseguer and Goguen [29], or Wirsing [40]. 

The theory of algebraic specifications is based on theories of universal al- 
gebras (e.g., Wechler [39], Meinke and Tucker [28]), computable and semicom- 
putable algebras (Stoltenberg-Hansen and Tucker [35]), and term rewriting (Klop 
[25], Terese [37]). 

We use standard notations: typically, we let X be a many sorted signature 
and A a total X algebra. The class of all total X algebras is Alg(5’) and the 
class of all total X-algebras satisfying all the axioms in a theory T is Alg(X, T). 
The word ‘algebra’ will mean total algebra. 


3 Specifications for Rational Complex Numbers 


3.1 Algebraic Specifications of the Rationals 


We will build our specifications in stages. The primary signature X is simply 
that of the field of rational numbers: 


signature X 
sorts field 


operations 
0: — field; 
1: — field; 


+: field x field — field; 
—: field — field; 

+: field x field — field; 
—1: field — field 

end 


The first set of axioms is that of a commutative ring with 1, which establishes 
the standard properties of +, —, and -. 


equations CR 











(c+y)+tz2=2+(yt+z) (1) 
aty=yts (2) 
cet+0=2 (3) 
x+(-x) =0 (4) 
(x-y): z =x: (y: z) (5) 
T-yYy=y: ar (6) 
2:-1l=2 (7) 

x: (y+z)=xz- ytz- z (8) 


end 
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Our first set SIP of axioms for 7! contain the following, which we call the 
strong inverse properties. They are “strong” because they are equations in in- 
volving ~! without any guards, such as x # 0: 


equations SIP 


E, 
` B 
Le 
hae SE 
ee 
l 
8 a 
ei 
sj 
es 
~~ 
ee 
re oO 
SS "” 


end 


Our specification CRU SIP draws attention to division by zero: 
Lemma 1. The following equation is provable from CRU SIP: 
07t =0. 
In particular, in our [7| (Theorem 3.5) we add a single axiom L to prove: 


Theorem 2. There exists a finite elementary equational specification (X, CRU 
SIP UL), without hidden functions and under initial algebra semantics, of the 
rational numbers with field operations that are all total. 


In [7] we also add to CRU SIP the restricted inverse law (Ril), 


equations Ril 
a-(e#-e)=2 (12) 
end 
which, using commutativity and associativity, expresses that x-2x~! is 1 in the 


presence of z. 
Whilst the initial algebra of C'R is the ring of integers, we find that 


Lemma 2. The initial algebra of CR + SIP + Ril is a computable algebra but 
it is not an integral domain. 


The models of CR + SIP + Ril are algebras with nice properties, in spite of 
not being fields nor even integral domains. 


Definition 3. A model of CR+ SIP + Ril is called a meadow. 


All fields are clearly meadows but not conversely (as the initial algebra is not 
a field). 


Theorem 3. For any closed terms t,t’ € T(X), the following are equivalent 
y g 


1. t= t is true in all totalised fields. 
2. t=t is true in all totalised meadows. 
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3.2 Algebraic Specifications of the Rational Complex Numbers 


We add to the field signature X the complex conjugate operation cc: field — 
field to form the signature Xec. Also to this signature we add the constant 
i: — field to form the signature Xec i. Consider these equations over the sig- 
nature Xec,i: 


equations CC 


ja (13) 

ec(1) =1 (14) 
cc(i) = —i (15) 

ce(x + y) = cc(x) + cely) (16) 
ce(x - y) = ce(x) - ec(y) (17) 
ce(—x) = —cc(2) (18) 
ec(a~*) = (ce(x))~* (19) 
ec(x- a t)=a2-a7} (20) 


end 


3.3  Totalised Fields and Algebras Satisfying the Specifications 


The axioms of a field simply add to CR the following: the general inverse law 
(Gil) 


oA) = peg) = 1 
and the ariom of separation (Sep) 
OFA 1. 
Thus, let (X, Triera) be the axiomatic specification of fields, where 
Tfiea = CRU Gil U Sep. 


Clearly, this specification is not elementary as it contains negations; and, as it 
is commonly applied, allows partial functions in its models. 

However, by definition, the class Alg(2’, Tyicia) is the class of total algebras 
satisfying the axioms in Tfiera. For emphasis, we refer to these algebras as to- 
talised fields. 

For all totalised fields A € Alg(X’,Tyicia) and all x € A, the inverse x7! is 
defined. In particular, Fe is defined. The actual value 04) = a can be anything. 

However, it is convenient to set 071 = 0 (see [7], and compare, e.g., Hodges 
[21], p. 695). We use the specification CRU SIP which forces 07} = 0 (Lemma 
D. A field with 07t = 0 we call a 0-totalised field. 

The main X-algebras we are interested in are these: first, 


Qo = (Q|0, 1, Ta n=) 
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where the inverse is total 


xt =1/x if x £0; 
=0 ifx =0 


This total algebra satisfies the axioms of a field Tyjeq and is a 0-totalised field 
of rationals. Next, we are interested in the 0-totalised field extension Qo(i) and 
its expansion by conjugation Qo(i, cc). 


4 Proof of Main Theorem 


Theorem 4. There exists a finite elementary equational specification (Xec,i, E), 
without hidden functions, of the algebra Qo(cc,t) of rational complex numbers 
with field and conjugate operations that are all total, under initial algebra se- 
mantics. That is, 


T(Xee,is E) = Qee,i 


Proof. Let (X, E) be any elementary equational specification without hidden 
functions of the 0-totalised field of rationals Qo = (Q|0,1,+,—,-,~1) so I(X, E) 
= Qo. By Theorem ØP] there is such an elementary specification. The strategy 
is to build a specification of Qo(cc, i) using real and imaginary parts, which are 
rationals. 

Let Xec be the field signature X extended by the complex conjugation oper- 
ator 


cc: field — field. 
First, we look at conjugation on Q. 
Conjugation on Q. Conjugation on Q C C is the identity function, cc(r) = r 
for r € Q. Let Qo(cc) be the 0-totalised field of rational numbers extended by 


conjugation cc. Let E* be the result of applying the following transformation of 
the equations in FE: for each variable x in each equation of E substitute 


$(a +cc(x)), 


where 5 = (1+1)~!. When applied to a complex number, the formula calculates 
its real part so when applied to a rational complex number from Qo(cc, i) it 
returns a rational number that would satisfy the equations of E. 

Now define 


Et, = E+ U {ce(x) = x} U {4 (£ + x) = x}, 


Lemma 3. I(Xec, EŻ) S Qo(cc) 
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Proof. We use Goguen’s conditions. 

No Junk The algebra Qo(cc) is clearly Xec minimal since Qo is X minimal. 

No Confusion By inspection, Qo(cc) | Ex. We have to show completeness. 
First, we make the observation that 





EŁ ll oH 


since we can derive the equations of E by substituting back using the two axioms 
added as follows: 


Et,- 4(x + ce(x)) = (a+ 2) =f. 


Now suppose that Qo(cc) = tı = t2 for any closed terms. By using axiom 
cc(x) = x in EX, we can delete cc in the terms t1,t2 € T (Xec) to form ti, th € 


cc? 


T (2) such that 





E}. F tı = t} and EZ F t2 = th. 
We know that 
Qo Et =t 


and since (X, E) is an initial algebra specification, we have 





Ert=t. 
Hence, by the above observation, 
Et, F ti = th: 


and by applying cc(x) = x as often as t4, th contain occurrences of cc 








Ex. - ty = to. 


This completes the argument. 
Let us replace the equation cc(x) = x in EX, by the set 
CCid = {cc(t) = t|t € T(X)} 
of all its closed X-term instances. We define: 


Ext = Et U CCidU {$(a@ + x) = a}. 


Lemma 4. I(Xec, EZ") S Qo(cc) 


Proof. Replacing an equation by the set of all its closed instances does not change 
the initial algebra. In this case the cc’s can be removed anyway. 


Now we consider the complex numbers 


Conjugation on Q(i). Now consider the signature Xec = Nee U{i: — field}, 
and the algebra Qo(cc, i) of rational complex numbers. We define the set 
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T =CRUSIPU RilUCCU{2.27! =1} 


Theorem 5. I(Xec i, E* UT) S Qo(ce, i) 


Proof. We verify the Goguen conditions. 
No Junk Clearly, Qo(cc, i) is Xec, minimal. 
No Confusion By inspection, 


Coleen) =| E UT. 





For this we use the fact that the substitution of $(a + cc(x)) for each x in E 
guarantees that E is restricted to rational values which are the real parts of x 
when evaluated in Qo(cc, i). 

To complete the argument we need some lemmas. 


Lemma 5. For each t € T(Xec), we have T F cc(t) = t. 
Proof. This is an easy induction on t. 


Some useful consequences of this lemma are as follows. First, T F CCid. 
Furthermore, we may deduce 





-2x 2.x EE 





TH a(at+2)=4 


using CR and the axiom 2. 27t = 1 in T. So we also have 


E+UTE Ett. 





Now suppose that Qo(cc,i) H} tı = t2 with t1, t2 € T (Xec). To show com- 
pleteness we have to show that ET UT F ti = to. 








Lemma 6. For any closed term t € T(Xec,i) there are terms p,q E€ T(X) such 
that 


TrKt=p+i-gq. 
Proof. We prove this by induction on terms. 


Basis By the ring axioms of C'R, the constants are as follows: 


0=0+4+i-0 
=1+i-0 
i=0+i-1. 





Induction Step There are five cases, one for each operation. The cases of +, —, - 
are easy - here is one: 
Let t = tı - t2 and suppose as induction hypothesis: 


THF ti=p +i-q and T F to = po + i - qo. 
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Then, substituting, we calculate: 
ThKt=%t-te 
Ft = (pı +i- qi): (p2 +i- q2) 





A- t = (pi: p2 — qı ` q2) + i(q1 - p2 + q2: pı) 


The other cases are more interesting. 
Let t = cc(to) and suppose as induction hypothesis: 


T F to = po +i- qo. 


Then, substituting, we calculate: 
TF t= ce(to) 

F t= ce(po +i qo) 

F t = ce(po) — t+ ce(qo) 

F t= po +i: qo 





by assumption 
by induction hypothesis 
by axioms in CR and 


i-i = —l1 


by assumption 

by induction hypothesis 
by axioms of CC in T 
by Lemma [5] 


Let t = r7} and suppose as induction hypothesis: 
Thkr=pt+i-q. 


Then, substituting, we calculate: 




















TeKt=r! 
1 
a - 
p+i:q 
1 1 1 
t= 3 s ( : 
pt+i-q pt+i-q ‘pti-g 
1 1 ; 
Ft= - . -(p+i-q) 
pt+i-q p+i 
1 4 
Sates pti-g 
p+i-q pti-q 
1 4 
His - - ce(Ë 2) 
pt+i-q ‘pti-g 
Lpa 1 cc(p +i-q) 
pti-q cc(p+i-q) 
i= 1 cc(p) — i- cc(q 
pti-q cc(p)—i-cc(q) 
= p-ig 
(p+i-q)-(p—i-q) 
ee ae E. 
rte rte 





whch has the required form. 


by assumption 

by induction hypothesis 

by Ril in T 

by SIP 

by axioms of CR 

by axiom of cce(z- a7") = 2-27" 
by axiom of CC 

by axioms of CC 


by Lemma D] 


by axioms of T 
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Finally, to finish the completeness, suppose Qo(cc, i) H tı = te with t1, t2 € 
T (Xec, i). By Lemma [6] we suppose that 





Tet =p, +i- qı and TF tg = po +i- q2. 
where p1, p2,q1, q1 € T(X). Hence, 
Qo E pi = p2 and Qo F qı = Q2. 
By Lemma 4 I(Xec, EZ") & Qo(cc) and so by completeness 








Ext F pı = pe and EX | qı = q2. 


Next, thanks to Lemma [5] a consequence is 





E*CUTE Ex. 
Therefore, 
E* UTE pı = po and EFUTK qı = Q2. 
and so we are done with 
E+UT F pı +i- qı =poti-q. 
This completes the proof of Theorem B] 


And hence the main theorem. 


5 Concluding Remarks 


There are open questions left over from the study of the rationals. For example, 
the following problem is quite basic: 


Problem 1. Is there a finite elementary equational specification of the 0-totalised 
field Qo, without hidden functions and under initial algebra semantics, which 
constitutes a complete term rewriting system? 


We know from our [5] that there exists such a specification with hidden 
functions. 

However, questions proliferate as one reflects on the number of algebras using 
the rational numbers ([36|). For example, we do not know the answer to these 
simple questions. 


Problem 2. Is there a finite elementary equational specification of the field Qo( îi) 
of rational complex numbers, without any hidden functions? 


Problem 3. Is there a finite elementary equational specification of the algebra 
Qoli, cc), (without further hidden functions), which constitutes a complete term 
rewriting system? 
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The rational numbers constitute the data type for measurement with a finite 
system of units and subunits. The real and complex numbers are constructed 
as completions of the rationals, using the idea of the approximation of measure- 
ments with unlimited accuracy. The real and complex numbers are the basis for 
vast range of data types used to model physical systems by means of measure- 
ment and equations (e.g., algebras of sequences, streams and signals, scalar and 
vector fields, continuous functions, probability distributions, and their abstrac- 
tions). In general terms the data in these algebras are continuous data and they 
are built by some completion process from subalgebras containing discrete data, 
as the reals are made from the rationals. 


Problem 4. To create a comprehensive theory of computing, specifying and rea- 
soning with systems based on continuous data. Ideally, the theory should integrate 
discrete and continuous data. 


At present this is a huge and complicated task as computation, specification 
and verification on continuous data are all active research areas. In fact, the task 
is a challenge in the special case of real numbers, see [7] for a discussion. 
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Abstract. The semantic and logical treatment of recursion and of recursive 
definitions in computer science, in particular in requirements specification, in 
programming languages and related formalisms such as A-calculus or 
recursively defined functions is one of the key issues of the semantic theory of 
programming and programming languages. As it has been recognised already 
in the early days of the theory of programming there are several options to 
formalise and give a theory of the semantics of recursive function 
declarations. In different branches of computer science, logics, and 
mathematics various techniques for dealing with the semantics of recursion 
have been developed and established. We outline, compare, and shortly 
discuss advantages and disadvantages of these different possibilities, illustrate 
them by a simple running example, and relate these approaches. 


1 Introduction 


In informatics, recursion appears - explicitly or implicitly - everywhere. Throughout 
this paper we study the following problem pattern. We assume that we are given a 
heterogeneous algebra, also called a computation structure in the foundation of 
abstract data types, consisting of a family of carrier sets (corresponding to data types) 
and a family of functions/operations over them. We furthermore assume that we are 
given a logical theory for which the algebra is a model. This theory need not 
necessarily be logically complete. In this case, there might exist many further 
essentially different models for the given theory. 

We study the introduction of an additional function identified by a fresh function 
symbol f into this algebra and its logical theory, in particular. We carry out this 
extension by first fixing the functionality of the introduced function. The sorts and the 
associated carrier sets that form its domain and range determine it. After fixing the 
functionality we define the values (the “graph’”) of the function by an explicit, 
possibly recursive equation f(x) = E. 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 476-496, 2006. 
© Springer-Verlag Berlin Heidelberg 2006 


From Chaos to Undefinedness 477 


A signature, a set of axioms, and a set of inference rules provide a logical theory. 
The signature provides symbols for constants, functions, logical variables, and, in the 
case of heterogeneous theories, of sorts (also called types) as well. Based on the 
signature we form terms and formulas. The set of axioms I and inference rules induce 
a deduction relation |-. We express by T |- O the proposition that the formula 0 follows 
logically from the axioms in T by the logical inference rules of the logical theory. 

With logical theories we associate models. A model is an algebra. We work only 
with total algebras in the following. These are algebras where all functions are total. 
In the case of heterogeneous theories it is a heterogeneous algebra, which contains a 
“carrier” set for each sort of the signature, a data value for each constant and a 
function for each function symbol. This family of sets and functions allows us to 
interpret terms and formulas. Terms are interpreted by mapping them onto elements 
of the algebra's carrier sets, which represent their values. Formulas are interpreted by 
truth values. 

For a model we require that all formulas 0 for which the proposition F |- O holds be 
mapped by the interpretation to the truth-value true. For each model we call the set of 
elements of the carrier sets for each of which a term exists whose interpretation yields 
this element the standard elements (also called the term generated elements). The 
other terms are called non-standard elements. A model that contains only standard 
elements is called a standard or a term-generated model. An equation of the form f(x) 
= E is called recursive (for the function symbol f) if the function symbol f occurs in 
the term E. 

The introduction of a new function symbol by recursion can be studied either in the 
model-theoretic or in the logical setting: 


e In a logical approach, we extend the signature of the given logical theory by 
adding the fresh function symbol f to it. Then we add the specifying equation 
f(x) = E as an axiom. By this approach, we transform the given theory into an 
extended one. Of course, we want to be sure that the step of adding the function 
symbol f and the equation provides a conservative extension of the logical 
theory (meaning that we do not introduce any additional properties to the given 
logical theory) and that the defining equation’ characterises the function f 
uniquely. 


e Ina model-theoretic approach, we extend the algebra by a function called f that 
is required to fulfil the equation f(x) = E. To justify that step we want to be sure 
that such a function actually exists (in other words that the definition of f by the 
equation actually makes sense) and that it is uniquely determined by the 
equation. 


Of course, both approaches are closely related. In the context of the logical approach 
we may consider the set of models of the theory. Then the idea of a conservative 
extension can be used, from which it follows that each of the models of the logical 
theory can be extended in a unique way adding a function called f that fulfils the 
defining equation. 





' More precisely, for every ground term t of appropriate sort we would like to be able to reduce 
the term f(t) to a ground term that does not contain the function symbol f. 
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Recursion is a funny concept. To “define” a function recursively seems like a miss- 
use of the principle of definition, which requires that in a definition a new concept be 
defined exclusively in terms of known concepts. A recursive equation is circular by 
nature while it is a fundamental principle that definitions are required to be 
noncircular. This is in contrast to explicit, noncircular, defining equations where we 
define a function f by an explicit equation 


(*) f(x) = E where the function symbol f does not occur free in the term E 


In the nonrecursive case E is an arbitrary term that may contain the identifier x but 
must not contain the function symbol f, however. Of course, by the nonrecursive 
equation (*) the function f is uniquely determined. Moreover, it is obvious that such a 
function f always exists. In other words, adding the equation (*) to define the meaning 
of a fresh function symbol f within a logical system does never introduce any 
contradiction. The extended theory is always a conservative extension by 
construction. 

The function application f(x) is then only an abbreviation for the term E. Similarly, 
for any term G the term f(G) is just an abbreviation for the term E[G/x]. Here by 
E[G/x] we denote the term formed by substituting the term G for the identifier x in 
the term E. It is obtained from E by replacing all (free) occurrences of the identifier x 
in the term E by the term G. 

This simple situation of explicit (nonrecursive) equational definitions changes 
crucially if we allow for recursive equations. By recursion we declare a function f by 
the recursive equation 


(**) f(x) =E 


where the term E may contain arbitrary many applications of the function (symbol) f. 
In the case of such recursive definitions of functions the semantic and logical 
treatment gets way more complicated. We observe: 


e There need not exist at all a function f that fulfils the equation (**); in 
other words, adding the equation (**) as the defining axiom for the fresh 
function symbol f to an axiomatic theory may introduce a logical 
inconsistency and allows the deduction of a contradiction. 


e There are cases where there exist many distinct functions f that fulfil the 
defining equation (**). So adding the equation (**) as an axiom may lead 
to an extension of a complete theory into one that is incomplete. 


e There are cases where the term f(t) cannot be reduced with the help of the 
axioms and the equation (**) to a term that does not contain the function 
symbol f. Then new “nonstandard” elements that were not representable 
by the terms available so far may be the result of function calls of f, in 
other words, f may be chosen such that it yields results that are not 
elements in the original algebra. As a consequence there may be standard 
models of the extended logical theory that are not standard models for the 
original logical theory. In fact, there are even cases where there does not 
exist a standard model for the original logical theory the carrier sets of 
which form a standard model for the extended theory, if in the standard 
model a total function f that fulfils the equation does not exist. 
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Recursion has both a logical and an operational flavour. The function f is 
characterised by an equation f(x) = E and this equation defines a rewrite rule f(x) > 
E. This observation is the bridge between a descriptive and an algorithmic 
interpretation of the syntax of a programming and a specification language. In the 
following, we show a number of technical options to treat the semantics of recursion 
by logical and mathematical means. 

One popular way to deal with the semantics of recursion is to give an operational 
semantics for recursively defined functions. Term rewriting can do this. This means 
that we introduce a rewriting relation — on terms defined by rewriting rules. In the 
case of a recursive definition, given a ground term t that contains the recursively 
defined function symbol f we assume that the reduction sequence 


tot >... tp 


for the term t either terminates leading to a term t, that cannot be reduced anymore 
(meaning there does not exist a term tp+] such that ty —> t,+) holds) such that t} is a 
term in normal form, which in our case, in particular, means that tp does not contain 


applications of the function symbol f any longer (otherwise we could use the 
reduction f(x) — E) or that the reduction can be continued forever resulting in an 
infinite reduction sequence. 

This operational interpretation provides a strong guideline for the logical and 
denotational treatment of recursion. Given an operational interpretation (be it by an 
interpreter or by a term rewriting system) we have a clear reference for the logical 
treatment of recursion: the logical interpretation should match with the operational 
one, or, formulated in a more demanding way, it should reflect exactly the abstract 
behaviour induced by the operational semantics in terms of rewriting. 

Before we go deeper into the semantic treatment of recursion let us be more 
precise on the used syntax. A recursive equation 


f(x) = E (*) 


is an equation where the expression E contains an arbitrary number of applications 
of the function f. Since the term E is finite it certainly contains only a finite number, 
say n € N, applications of the function f, say f(G1), ..., f(Gp). In the following we 
sometimes want to identify the instances of the individual applications. This can be 
done by replacing the expression E by an expression E” such that in E” each of the 
fresh function symbols fi, ..., fa occurs exactly once (we assume that the identifiers f}, 
..., fa do not occur in the term E) such that the following equation holds: 


E=E [ffi a4 fh) 


This way each application is marked individually by a function symbol f; that occurs 
exactly once. In a model-theoretic approach we associate with the expression E” a 
function (by using individual function symbols in each application) 


(1) theese A fp: iE 
or a function (using the function symbol f for each of the applications): 


(2) Af Ax: E 
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The function given by (2) is called the functional associated with the recursive 
equation (*). The function given by (1) is called the multicall functional associated 
with the equation (*). If the expression E contains exactly one recursive application 
both coincide. 

In general, the expression E contains conditional expressions to formulate case 
distinctions. These can be eliminated and transformed (under the assumption that C is 
a Boolean term) into conditional equations by replacing each equation of the form 


f(x) = if C then E} else E> fi 

into the following two conditional equations: 
C= true > f(x) = E] 
C= false > f(x) = Ey 


If we assume then C is two valued (“tertium non datur”) this translation is an 
equivalence relation. Moreover we also use rules like: replace the term 


f(if C then E} else E> fi) 


by the semantically equivalent term 
if C then f(E1) else f(E>) fi. 


By furthermore breaking up the terms in E in the equation f(x) = E that way we can 
eliminate all conditional expressions by implications. This way recursive equations 
are transformed schematically into a set of conditional equations and vice versa, as 
long as all conditional equations have the form shown above. 

All problems with recursive declarations disappear if we manage to choose our 
defining equation for f such that it defines the total function associated with the 
function symbol f uniquely. Special cases, where this applies, are definitions that can 
be interpreted inductively. This means that we can find a Noetherian partial order < 
on the domain of the function f such that the following holds: in the recursive 
equation 


P= f(x) =E’ 
where the expression E’ contains only the recursive applications f(G1), ... f(Gk) of the 
function f we can prove that the values of the terms G1, ..., Gn are always elements 


that are in that ordering strictly below the original argument x1). In other words, we 
can prove 


P>G,.<x 


for all k. Then the recursive definition can be seen as an infinite set of explicit 

nonrecursive equations for the values of the function associated with the identifier f. 
In the following, we shortly recapitulate and relate the various techniques to give a 

denotational or an axiomatic semantics to recursion. The main goal of this work is to 





D In fact, working with conditional expressions the situation gets slightly more complicated. 
The recursive applications are guarded by conditions. Only if the conditions evaluate to true 
the modified parameters G, have to be strictly below x. 
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integrate and justify a number of methodological decisions when working with a 
theory of program construction. 

This work is motivated to a large extend by discussions in the IFIP Working Group 
2.3 by discussions with Michel Sintzoff and the work of Tony Hoare, presented at the 
Marktoberdorf Summer School 1996, towards a integrating framework for the 
semantics and different semantical techniques to treat recursion in specification and 
programming languages. 


2 Simple Running Example: Division 


We demonstrate the various approaches to deal with the semantics of recursion by a 
very simple running example. This example has to be simple enough to keep the 
treatment short and concise but it should include and envisage the typical problems 
that arise when dealing with recursion. With this in mind we choose arithmetic 
division on the naturals as a running example. 

Let N denote the set of natural numbers. Division on the naturals can be 
represented by a partial or by a total function on the naturals: 


div: Nx N > N 

or (note that functions are a special case of a relation) by a relation 
Divec Nx NxN 

or (isomorphic to a relation) by a set-valued function 
DIV: Nx N> (N) 

or (again isomorphic to a relation) by a predicate 
isdiv: Nx N x N > B 

or by a predicate that characterises a set of (partial or total) functions 
ISDIV: (Nx N>N)>B 


Strictly speaking according to the foundations of mathematics the function div 
represents also a relation which is a subset of product set N x N x N. Since div is 
supposed to be a function we require that it contains for given numbers x, y at most 
one triple (x, y, z). This requirement for a relation to be called a function is known as 
the Leibniz principle. In other words, a function is a relation that fulfils the Leibniz 
principle. A function with two arguments is called total, if it contains for every pair of 
arguments x and y a triple (x, y, z); otherwise it is called partial. We are free to 
associate with a recursive definition a function or a general relation (allowing us to 
deal also with “nondeterminism” in our model). 

Of course, the critical question is how to specify the result of division in the case 
where its second argument is 0. In the algebra of partial functions there is a simple 
answer. Working with partial functions, we easily express that the result of a function 
application is “not defined” or more precisely “does not exist”. But for partial 
functions we pay the price that we now can write expressions that “do not of a value”. 
For total functions, on the other hand, this simple solution is not available. For them 
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for each pair of arguments a result has to be given, which, however, in the case of 
nontermination might be chosen quite arbitrarily. However, for partial functions the 
logical theory of equations is certainly less standard than that for total function. 

In the case of set-valued functions, relations, or predicates, the idea of partial 
functions is easily incorporated. We may return the empty set for DIV(x, 0), define 
that every triple of the form (x, 0, t) is not in the relation DIV or that the predicate 
isdiv(x, 0, t) yields always false. 

We may also represent undefined by “chaos” and return the set of all naturals for 
DIV(x, 0), including all triples (x, 0, z) into the relation Div and analogously define 
isdiv(x, 0, z) to be always true. In fact, we can choose many constructions between 
these two extremes. If we work with a predicate ISDIV that characterises a set of 
functions we can select any function that behaves like division in the case of the 
second argument being distinct to 0 and arbitrary otherwise. Or we may be more 
restrictive in the case the second argument is 0. All these options are discussed in 
more detail in the following. 


3 Inductive Definitions, Total Functions 


One simple way to cope with recursive definitions of functions is to follow the ideas 
of primitive recursion and their generalisations to inductive definitions. To do that we 
make sure that the recursive equation that defines the function is based on some kind 
of Noetherian ordering and therefore defines a function uniquely. The advantage of 
this approach is that it allows us to keep the logics classical and simple. For instance, 
we may restrict our model and our logics to total functions. Then all terms that can be 
formulated over the given signature denote well-defined elements. 

Unfortunately by this technique, which is often used in type theory and in a 
number of verifications support systems like PVS (see [PVS 92]), recursive 
definitions need more work for their justification, since we have to prove termination. 
For that an appropriate Noetherian order has always to be introduced explicitly by 
which has to be proven that the definition is inductive. Only in simple cases this proof 
can be carried out by schematic or even automatic proof techniques. More 
remarkable, however, is that certain recursive definitions of practical importance 
cannot be treated at all that way or at least not in a straightforward manner. Famous 
examples are functions with nonrecursively enumerable codomains such as 
interpreters of programming languages of universal computability (an example is 
typed A-calculus with p-recursion). For these examples a constructive definition of 
the inductive ordering does not exist. 

We use our running example of division on natural numbers to demonstrate the 
idea of inductive definitions. We work with the function symbol div that has two 
parameters. The critical question is, of course, which result the function div should 
return if the second parameter is 0. 

A recursive definition of the total function associated with the symbol div is given 
by the following conditional equations (let n, m be of sort Nat): 


div(m, n) = <=n>m 


div(m, n) = 1+div(m-n,n) @Hm2=naAn>0 
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In the second equation - read from left to right as a rewrite rule - the first argument is 
decreased provided n > 0. Note that the condition n > 0 for the second equation is 
crucial according to the assumption that div is a total function. Leaving it away would 
lead to the equation 


d=1+d 


for the natural number d = div(m, n) for the case n = 0, which introduces a 
contradiction into the theory of natural numbers. 

By the conditional equations the result of applying div to the parameters (m, 0) is 
obviously not specified. Therefore we characterise this approach to the treatment of 
recursion by the sketch word total functions with underspecification. 

Underspecification means, of course, that the axiomatisation of the introduced 
function is incomplete. As a consequence, there are several functions that fulfil the 
axioms. A simple, but not very elegant trick to specify the function div uniquely after 
all would be to give an arbitrary specification for the case that the second parameter is 
0, for instance, by specifying: 


div(m, 0) = 0. 


However, this trick is by no means very elegant and even, in general, not always 
possible. The idea of restricting the equations by predicates that characterise the 
parameters for which the function is defined is not always so simple to achieve. In the 
case of functions with nonrecursive domains (more precisely, not recursively 
enumerable codomains), in fact, we cannot even formulate the domain restriction by a 
decidable (computable) condition. 

The two equations given above provide, of course, not a classical inductive 
definition. In a classical inductive definition we work at the left-hand side of the 
equation with patterns of the form div(0, n) and div(m+1, n). For division, such a 
version of a specification is neither very efficient nor very elegant nor intuitive. 
Nevertheless, it can be rather easily formulated as follows: 


div(0, n) = 0 
div(m+1,n) = ifmtl-div(m,n)*n2n then div(m, n)+1 
else div(m, n) 
fi 


This is in fact a classical inductive definition, working with the standard Noetherian 
ordering on the natural numbers. In the case n = 0 we easily deduce the equation 


div(m, n) =m 


by the two defining equations which is certainly a possible, but of course arbitrary 
and therefore somewhat artificial choice for the value of div(m, 0). We might call 
such an implicit specification of the result of a function by some rather arbitrarily 
chosen value an overspecification. It constraints the function div in a way not justified 
by its underlying theory of arithmetic. 
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4 Recursive Equations in CPOs and Metric Spaces 


Another option that avoids - in contrast to inductive principles - the requirements of 
the existence of an inductive order is to introduce either a partial ordering or a metric 
distance into the function space of the recursively defined function. This construction 
is done such that these sets are turned into complete partially ordered sets, or into 
complete metric spaces. Then it has to be shown that the defining functional is 
monotonic, or, in the sense of the metric, strongly contracting. From this, by general 
theorems, the existence of a (least) fixpoint can be concluded, which is a particular 
solution for the recursive equations. 


4.1 Least Fixpoints and CPOs of Partial Functions 


In the partial order approach even recursive equations are treated which do not define 
fixpoints uniquely. A specific fixpoint (“least fixpoint’) is associated with the 
recursive equation (or more precisely with the function associated with the right-hand 
side of the defining equation) by selecting the least function in that ordering that 
fulfils the equations. We may even treat recursive equations that way which, added in 
a naive approach for total functions, would introduce a contradiction. This is achieved 
by introducing elements representing “undefined” with specific logical properties”. 
This way partial functions are represented by total functions. 

In the general case, however, there may exist many solutions (meaning several 
functions that fulfil the equations) for the recursive equation. The classical approach 
of fixpoint theory is then to choose the least solution, the so-called least fixpoint of 
the functional, associated with the defining equations. 

As it is well-known we can turn the set of the natural numbers by a simple 
extension into a complete partially ordered set (cpo). We introduce a pseudo element 
that serves as a dummy for the result of function applications that does not have a 
well-defined result. Along these lines, we define the following “natural” extension of 
the set of natural numbers 


Nt =NuU {L} 
We define the function 

div: N` x N+ > N+ 
on the set N extended by L. We specify the function div on this extended set by (V 
m, n € N) the equations: 

div(m, n)= 0 <&n>m, 

div(m, n) = div(m-n, n) + 1 en<m. 


Here we do not have to give any restrictions for the value of the argument n in the 
second equation. The reason is that for the case n = 0, although we now get the 
equation 


div(m, 0) = div(m, 0) + 1 
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which does not lead to a contradiction since we may choose (and in this case can even 
deduce this equation from the defining equations since L is the only element that 
fulfils the equation) 


div(m, 0) = L 


One might ask, why introducing the dummy not only into the range of the function 
div but also into its domain. The answer is simple. If we want to freely form 
expressions of nested function applications such as in the term 


div(div(1, 0), 2) 


we have to allow for the case that also the values of the arguments of the function div 
might be L. However, then we have also to specify the result of the function div in 
cases where one of its arguments is L. For that case we choose a simple solution. We 
assume that div and all other arithmetic functions are strict functions. This means that 
the result of a function is L whenever by strictness one of its arguments is L. For div 
we get the equations: 


div(x, L) = div(1, x) = 


This idea of a “strict” extension of total or partial functions to total functions on 
domains and ranges that are extended by the element L can be used also for all the 
other functions schematically such as the arithmetic functions. This extension is 
required, anyhow, to be able to cope properly with terms like div(n, 0) + 1. 

The set of strict functions on domains and ranges that are extended by the dummy 
is isomorphic to the set of partial functions on sets the same domains and ranges 
without the element L. It is not difficult to reformulate (see [Broy 86] all constructs 
for strict functions for the set of partial functions and vice versa. However, when 
interested in nonstrict functions the concept of partial functions is no longer powerful 
enough. 

We introduce a partial order & on the set N+ as follows: 








VmneN:mGne(m=1vn=m) 
It extends to functions by pointwise application 
fEfovmne N: f(m, n) E fm, n) 


A related approach to domains for recursive equation are metric spaces. In the metric 
space approach the treatment of partiality and underspecification is not so simple. The 
functions for which we want to find a fixpoint are required to be strongly contracting, 
in general. If they are, they have a unique fixpoint’. 

We do not give a metric version of the treatment of the recursive equation for div 
since the classical approaches work only for functions that are total such that the 
defining functions are contracting and have unique fixpoints (see [de Bakker, Zucker 
84]). Therefore they do not apply immediately to our example. We come back to the 
metric space approach in section 4.3 when working with sets of functions. 





> We may work with set-valued functions instead of functions producing single elements as 
results. Then we may drop the requirements of strong contractivity and replace it by weak 
contractivity. 
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4.2 Complete Lattices of Predicates 


A technique that is very similar to the cpo-based least fixpoint approach is the 
complete lattice of predicates. As it is well known, predicates are partially ordered by 
logical implication and form a complete lattice. We explain the idea with the help of 
our example. Let us consider the following functional 


t: (N+ x Nt > NÐ > (N+ x N+ > N’) 
that maps functions onto functions and that is specified by the term in the defining 
equation for division as follows: 





t[f](m, n) = if m <n then 0 else f(m-n, n) fi 


The functional t is induced by the recursive equation that is specifying div. The 
fixpoint equation then reads follows: 


div = t[div]. 


We may replace the functional t associated with the recursive equation by a predicate 
transformer that operates on predicates over functions: 


T: (N+ x N+ > NÐ > B) > ((N* x Nt > N) > B) 


It is specified by (note the similarity to the functional t introduced above) the 
recursive equation 


T[Q].f= Vm,neN: m<n> f(m, n)=0 
A m2zn>4f: Q[f] A f(m, n) = 1+ f(m-n, n) 
or expressed with the help of the function t: 
T[Q].f=4 f: f= [f] A Q[f] 


Obviously, the function T is an inclusion monotonic function on predicates”. 
Therefore it has (recall that the set of predicates forms a complete lattice) as well 
known from u-calculus a weakest and a strongest fixpoint. It is not difficult to show 
that the strongest fixpoint of the predicate transformer T is the predicate i f: false and 
the weakest fixpoint is the predicate 


Af: ¥m,neN: (m<n > f(m, n) =0) 
A (m2=nan>0= f(m, n) = 1+ f(m-n, n)) 
Hence the weakest fixpoint is the predicate on functions that characterises all 
functions that fulfil the defining equations for div but produce arbitrary results in the 
case the second parameter is 0. 


In general, we may treat recursive equations along the lines explained above as 
follows. Introducing a function 


f;D>R 





We call T also a predicate transformer. 
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specified by the recursive equation 
f(x) =E 
we may translate the recursive equation into a predicate 
Q:(D>R)> B 
specified by the equivalence 
Qif]= Vx: Sf, ..., fy: QL] A. A QLf,] A f(x) = E* 
where the expression E” is defined as in the introduction where we defined 
E=E[f/fy, ..., f/f] 


This definition of the predicate Q does never introduce a contradiction since the right- 
hand side is inclusion monotonic (or, in other terms, implication monotonic) in the 
predicate Q. According to p-calculus a strongest and a weakest solution exist. We can 
easily show, moreover, that the weakest solution is never the strongest predicate A f: 
false provided the equation f(x) = E specifying the function f has a solution. We only 
have to choose that solution for all the function f}, ..., f, to obtain one solution. But 


there are many other solutions, in general. More precisely there may be functions f 
that fulfil the proposition Q[f] where Q is the weakest solution. These functions need 
not by fixpoints of t. In the case of the function div as defined by the cpo approach 
these functions f all have the property div & f. 

We may replace the definition of Q by the more liberal equation 


Qf=4f:fP]ePr af Ef 
We prove that the predicate 

Q.f=div Ef 
is a fixpoint of T where 

T[Q].£f=4 f :Qf a[f] Sf 

as follows: 

T[Q].f 

=if:cf]&fa Qf] 

BaAf:dfl&fadvef 


Monotonicity of t shows 


div {fixpoint property} 

= [div] {monotonicity of t, div & f} 
= [f] 

Cf 


Thus T[Q].f=> div E f. 
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Now assume div & f. Then (div is a fixpoint) 
t[div] & f 
Thus we have 
Af: [f] E f^ Qf 
This shows that div & f > T[Q].f. 


4.3 Metric Spaces 


Similar as above where we work with predicates on functions we may work with 
metric spaces over the set of functions or relations. We introduce a metric distance on 
the function space (D > R): 

d: (D >R)x(D>R) >R 
such that (D > R) is a complete metric space. A function 

®:(D—>R)> (D> R) 
is called weakly contracting if for all sets of functions g1, g2 € (D > R): 


d(gi, 2) 2 d(®(gi), P(g2)) 


We call the function ® strongly contracting if there exists a real number s e R with 0 
<e<1 such that 


e d(gi, 22) 2 d(P(gi), B(g2)) 


The idea of metric spaces for proving the existence of fixpoints is well known. Given 
a complete metric space (X, d) and a strongly contracting function 


tXDX 


there exists a unique fixpoint of t. 

The critical issue is to select an appropriate metric distance on functions. In our 
example of functions in N x N — N we may define a metric distance as follows 
(where s is a number with 0 < e < 1): 


d(f,, f2) = max {£™: fi(n, m) # f(n, m)} 


This metric induces a metric distance on sets of functions and turns the function space 
into a complete metric space. In fact, with this metric distance every inductive 
definition leads to a contractive functional. Note, that in our running example the 
functional t is not strongly contracting. The technique of metric spaces as we 
introduced it works only for recursive equations that define least fixpoints that total 
functions. 
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5 Lattice of Predicate Logics 


When following logic oriented approaches to give semantics to recursion we work 
with inclusion-monotonic functions on predicates called predicate transformers 
where the specified functions are represented as relations or predicates. Recursive 
equations can be mapped onto predicate transformers and for predicate transformers 
we can apply constructions from p-calculus. This way we can associate strongest or 
weakest predicates with recursive equations specifying functions. These predicates 
characterise functions or relations. 

Note that both the choice of the weakest as well as the choice of the strongest 
solution of recursive equations leads to interesting interpretations of recursive 
equations. An example for such a treatment is given already at the end of the previous 
section. 

We may define the ternary predicate and the naturals Div (representing a relation) 
also directly recursively by the following logical equivalence: 


Div(m, n, r)= if m<n then r=0 
elif r=0 then false 
else Div(m-n, n, r-1) 
fi 


which reads in a more logical style (translating the if-then-else-fi into classical logical 
connectors) 


Div(m, n, r)=(m<na^ar=0)v(r>0 ^m Èn a Divi(m-n, n, r-1)) 


It is not difficult to show that from these equivalences we obtain directly the 
following conditional equivalences 


m<n > Div(m, n, r)= (r =0) 
m>nAr=0=> Div(m, n, r) = false 
m >na r> 0> Div(m, n, r) = Div(m-n, n, r-1) 


From these formulas we can deduce (via an easy proof by induction on the naturals) 
the following logical consequence: 


n > 0 > Div(m, n, r) = (0 < m-n*r < n) 
In the case n = 0 we obtain the equivalences 

r= 0 > Div(m, 0, r) = false 

r > 0 > Div(m, 0, r) = Div(m, 0, r-1) 


So by straightforward induction the only choice for the logical value of Div(m, 0, r) is 
therefore 


Y m,r € N: Div(m, 0, r) = false 


In contrast to functions modelling division in propositions Div(m, n, r) and in their 
isomorphic representation by ternary relations we do not have any indication which of 
the three arguments are considered as input and which as output for the operation to 
be defined. It is therefore more explicit to work instead with a set-valued function 
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DIV: Nx N> (N) 
that is specified by the following equation 


DIV(m, n) = ifm<n then {0} 
else {y+1: y € DIV(m-n, n)} 
fi 


Recall, that the powerset is a complete lattice ordered by set inclusion. Moreover, 
consider the functional ©, defined by the equation 


O[F](m, n)= ifm<n then {0} 
else {y+1: y € F(m-n, n)} 
fi 


We have DIV = (DIV). © is monotonic with respect to the set inclusion ordering. 
More precisely, it is monotonic with respect to the ordering on set-valued functions 
induced by pointwise application of the inclusion ordering on sets. Therefore, since 
the sets form a complete lattice there exists an inclusion least and an inclusion 
greatest fixpoint according to Knaster-Tarski. The least fixpoint is described by the 
set-valued function 


Am,n: {r e N: 0 <m-n*r<n} 


The greatest fixpoint is given by the same set-valued function. The set-valued 
functions are isomorphic to relations. They stress, however, which of the elements of 
the tuples in a relation are input and which are considered as output. 

Of course, we may also work with sets of natural numbers extended by the element 
as a dummy for undefined when associating relations or set-valued functions with 
recursive definitions. This way we obtain combinations of the partial order approach 
and the lattice of sets approach. 


6 Sets of Models 


When dealing with algebraic equations for the specification of functions which can 
also be seen as recursive equations for functions it is common by now to work with so 
called loose semantics approaches (see [Broy, Wirsing 82]). This means that we 
associate not only exactly one model with a set of axioms, such as for instance an 
initial model, but, in general, a set of models in terms of heterogeneous algebras with 
an algebraic specification (which is a logical theory). If we consider a purely 
equational specification and restricted forms of axioms we can identify extreme 
models in the class of models such as initial or terminal algebras. This works even in 
the case of conditional equations. These initial or terminal algebras are closely related 
to strongest and weakest solutions in a form of predicates that are associated with the 
logical treatment of recursion (see [Broy, Wirsing 80]). 

Let the following specification of natural numbers be given (we follow closely the 
syntax and concepts of Larch, see [Larch 93]): 
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SPEC NAT = 
{  based_on BOOL 
sort Nat 
0 : Nat, 
succ, pred : Nat > Nat, 
iSzero : Nat —> Bool, 
Himes Nat, Nat > Nat, Infix 


Nat generated_by 0, succ, 

iszero(0) = true, 

iszero(succ(x)) = false, 

pred(succ(x)) = x, 

Ory=y, 

succ(x)+y = succ(x+y), 

x-0 =x, 

0-y =y, 

succ(x)-succ(y) = x-y, 

0*y=0, 

suce(x)*y = y+(x*y) } 
It defines the natural numbers by the help of an induction principle and some basic 
operations (for details see [Larch 93]). Recall that functions in Larch are assumed to 


be total. If we add the following specification fragment (extending the specification 
NAT) 


div: Nat, Nat > Nat 
div(m, n) = 0 <=m<n (*) 
div(m, n) = div(m-n, n)+1 <e=m>n (**) 


to the specification NAT above, we get a contradiction (all functions are assumed to 
be total), since with the help of the generation principles which gives the basis for 
induction proofs we may deduce the proposition: V n e Nat: n ¥ n+l. As 
demonstrated before the equation 


div(m, 0) = div(m, 0)+1 


leads to a contradiction. If we drop the term generation principle (this is the principle 
to consider standard models only) then induction is no longer available as a proof 
principle and the contradiction can no longer be deduced since div(m, 0) may be a 
non-standard-value. 

However, giving up induction would hurt. The other option to avoid the 
inconsistency while maintaining the principle of induction is to use the following 
conditional defining equation: 
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div(m, n) = div(m-n, n)}+1 =<m=nan>0 


instead. Adding only this equation and the equation (*) there exist many models for 
the enriched specification. Each of these models contains a function div with arbitrary 
choices for the results of function application div(m, 0). This approach corresponds 
again exactly to the idea of underspecification. 


7 The Herbrand Universe 


Another area where recursive declarations are used is logic programming. Here we 
work with the concept of term models called Herbrand models. These so called 
Herbrand models are used to interpret “recursive” Horn clauses. 

When dealing with ideas from logic programming we do not represent operations 
like division by functions but by relations or by predicates. Along these lines we may 
describe division by the predicate 


Div:NxNxN-> B 

specified by the following Horn-clauses: 
Div(m, n, 0) <em<n 
Div(m+n, n, r+1) < Div(m, n, r) 


Of course, there are many predicates Div that fulfil these axioms. The weakest of 
these predicates is true (more precisely the predicate i x, y, r: true). The strongest 
predicate corresponds to the so called closed world assumption, which leads to the 
strongest predicate that fulfils the Horn-clauses. It is specified by the equivalence 


Div(m, n, r) = (0 < m-n*r < n) 


This relation Div directly represents the partial function div discussed extensively 
above. The closed world assumption simply assumes that all facts that are not 
explicitly stated as being true (more precisely that cannot be deduced logically from 
the axioms) are false. This is exactly mirrored by the possible computations (which 
may be seen as logical deductions) and also by the strongest fixpoint. This is the 
standard semantics used in logic programming. 

Of course, there are many other solutions (other predicates that fulfil the 
equations). For every natural number k e N we get a predicate Div, specified by the 
equation 


Div, (m, n, r) = (0 < m-n*r<n)vk<r. 


These relations Div, and the strongest fixpoint are examples for fixpoints (solutions) 
of the defining Horn-clauses for Div. 
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8 Conclusion 


Recursion is a fundamental concept in computer science. Recursion is used both 
explicitly, such as in recursive data type declarations, recursive function declarations, 
or in formal languages, and implicitly, such as in loops, everywhere. There are many 
options to treat the semantics of recursive equations. They all have serious impacts on 
the logical theories and mathematical models of programming. 

Let us finally survey the considered options for the treatment of recursion shortly 
once more: to treat recursive declarations we have to observe the following facts: 

Models with total functions without any extension to “undefined” cannot be 
extended by recursive equations without running into contradictions for certain 
recursive equations. We have to be careful to add conditions to those equations to 
avoid contradictions. However, in the worst case, such conditions are not recursive 
and thus not computable. 

We can extend the logic of total functions to partial functions, functions on cpos, 
relations, predicates, set-valued functions of even sets of functions. All such 
extensions can be used to treat recursion. In fact, such an extension makes the logic 
more sophisticated, in general. 

As we have demonstrated the different possibilities may be combined. For 
instance, we can work with sets of total functions or with sets of partial functions. 
Each of these approaches has its advantages and disadvantages. Working with total 
functions allows us to keep the logics simple, for the price that not all computable 
functions can be described by computable expressions and that contradictions and 
incompleteness may be introduced. 

A second disadvantage, from a practical point of view perhaps a more serious one, 
of the concept of total functions with underspecification is the fact that the arguments 
for which, operationally speaking, the recursion does not terminate are not 
distinguished logically from the cases where the values of the function are well- 
defined by a terminating recursion. This is awful and unacceptable from the point of 
view of software engineering since reasoning about exceptions, termination, and 
definedness is an important part of the specification and analysis of reliability and 
verification of programs. We want to be able to distinguish bad and unacceptable 
arguments from good acceptable ones! Therefore we are in favour of explicit 
representations of undefined (see the discussion in [Hehner 74]). 


Appendix 


As we have shown, a purely equational treatment and characterisation of solutions of 
recursive equations is difficult. However, there is one logical “trick” that allows us to 
work with total functions even in cases of recursive definitions that lead within a 
logical setting to least fixpoints that are function with non-recursively enumerable 
codomains. For each function 


f;D>R 
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that is specified by the recursive equation 
© f&)sE 
we introduce together with the function symbol f a corresponding predicate 
domf: D > B 


that characterises the subset of the domain D for which the function f has to have a 
well-specified result and replace therefore the recursive equation (*) by the weaker 
implication 


domgx) > f(x)= E 
Of course, this implication alone is not strong enough to characterize the function 
associated with the symbol f and certainly not at all to characterize the predicate domf 


the way we want it. A trivial choice to fulfil the conditional equation would be to 
choose dom¢(x) = false. Then the conditional equation is trivially fulfilled. Therefore 


we need additional axioms for specifying the domain restriction predicate doms. 
We assume for simplicity that all given function and operation symbols g: D > R 
in our signature have an associated domain predicate 


dom,: D > B 


We introduce a syntactic rewrite procedure that produces for every term E 
syntactically a logical formula DEF[E] that characterizes the proposition the value of 
the expression “E is well defined”. It is specified as follows: 


DEF[x] = true for 
identifiers x 
DEF[h(,...,E,)] = dom, (E, ..., En) A DEF[E,] 4... A DEF[E,] 
DEF{if C then E; else E, fi] = 
DEF[C] ^ (C = DEF[E,]) A (|C = DEF[E)]) 
For total functions g we simply choose dom,(x) = true for all x. 
Based on these definitions we replace the recursive equation 
f(x) =E 
by the following two axioms 
(**)  domx) > f(x) =E 
DEF[E] > dom¢x) 
If all function symbols g occurring in function calls in the expression E are totally 
defined in the sense of dom,(x) = true for all except the function f we can simplify 
this treatment. The formula DEF[E] then only refers to the definedness of the 
recursive calls in the term E. 


By this simple encoding of the domain predicate we work with underspecification 
both for the function f and the domain predicate domg. Formally, dome is a predicate 


and has nothing to do with the function f. However, the predicate is used as a guard 
for the recursive defining equation for f. So the defining equation is not required to be 
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valid if dom,(x) is false. This avoids contradictions, since in the case the function 


application f(x) does not terminate we cannot derive domx) = true and thus always 
may choose dom¢(x) = false. 


We demonstrate how our idea works in the case of our simple example. In the case 
of division we get the following defining equations for div: 


domg,(m, n) An > m => div(m, n) =0 

domg,(m, n) An < m => divi, n) = 1+div(m-n, n) 
and the following axioms for domgiy: 

n > m => domgiy(m, n) 

domg,(m-n, n) => domg;,(m, n) 


By this we can prove the definedness of the function div for all its arguments (m, n) 
with n > 0 introduced by a recursive equation. This way we exactly mimic the way 
computations are executed. 

However, the introduction of a domain predicate is only a logical trick which 
encodes the classical idea of fixpoint theory for partial functions into logics of total 
functions. In the case of our example we can prove 


domg,(m, n) =n #0 


Moreover, by contradiction, (assuming div is a total function) we can prove 
=domg,;,(m, 0) since domg;,(m, 0) would lead to a contradiction since by this we could 
deduce 


div(m, 0) = 1 + div(m, 0) 


Hence a proof by contradiction yields ~domgiy(m, 0). 

Note that, in general, however, the predicate, dom; is not uniquely specified by the 
axioms (**). If several fixpoints exist for the recursive equation f(x) = E then also 
several solutions for domain predicate dom, exist. If we choose the strongest predicate 
for dom; this reflects the idea of the least defined fixpoint. 


References 


[Broy 86] 
M. Broy: Partial interpretations of higher order algebraic types. (Invited lecture) In: J. 
Gruska (ed): Mathematical Foundations of Computer Science-13th Symposium, Lecture 
Notes in Computer Science 233, Berlin-Heidelberg-New York-Tokyo: Springer 1986, 29-43 

[Broy, Wirsing 80] 
M. Wirsing, M. Broy: Abstract data types as lattices of finitely generated models. In: P. 
Dembinski (ed.): Mathemarical Foundations of Computer Science - 9th Symposium. 
Rydzyna 1980, Lecture Notes in Computer Science 88, Berlin-Heidelberg-New York: 
Springer 1980, 673-685 

[Broy, Wirsing 82] 
M. Broy, M. Wirsing: Initial versus terminal algebra semantics for partially defined abstract 
types. Technische Universitat München, Institut für Informatik, TUM-I8018, December 
1981. Revidierte Fassung: Partial Abstract Types, Acta Informatica 18, 1982, 47-64 


496 Manfred Broy 


[Broy, Pepper, Wirsing 87] 
M. Broy, M. Pepper, M. Wirsing: On the algebraic definition of programming languages. 
Technische Universität München, Institut fiir Informatik, TUM-I8204, 1982. Revised 
version in TOPLAS 9:1 (1987) 54-99 

[de Bakker, Zucker 84] 
J. W. de Bakker and J. I. Zucker. Processes and the denotational semantics of concurrency. 
Information and Control, 54(1/2):70-120 

[Hehner 84] 
E.C.R. Hehner: Predicative Programming. Comm. ACM 27:2, 1984, 134-151 

[Knaster-Tarski] 
A. Tarski: A lattice-theoretical fixpoint theorem and its application. Pacific Journal of 
Mathematics Vol. 5, 1955, 285-309 

[Larch 93] 
John V. Guttag and James J. Horning, with S.J. Garland, K.D. Jones, A. Modet, and J.M. 
Wing: Larch: Languages and Tools for Formal Specification, Springer-Verlag Texts and 
Monographs in Computer Science, 1993 

[A-calculus 81] 
H.P. Barendregt: The Lambda Calculus: Its Syntax and Semantics. North-Holland 1981 

[u-calculus 73] 
P. Hitchcock, D. Park: Induction rules and termination proofs. M. Nivat (ed.): Proc. Ist 
ICALP. North Holland 73 

[z-calculus 99] 
R. Milner: Communication and mobile systems: the m-calculus. Cambridge University Press 
1999 

[Prolog/Herbrand Universe 87] 
J. Lloyd. Foundations of Logic Programming: 2nd Edition. Springer-Verlag, 1987 

[PVS 92] 
S. Owre, J. M. Rushby, N. Shankar: PVS: A Prototype Verification System. In: Deepak 
Kapur (ed.):11th Conference on Automated Deduction, Saratoga, NY, Jun, 1992 

[Schieder, Broy 99] 
B. Schieder, M. Broy: Adapting Calculational Logic to the Undefined. The Computer 
Journal, Vol. 42, No. 2, 1999 

[Sintzoff 87] 
M. Sintzoff: Expressing program developments in a design calculus. M. Broy (ed.): Logic of 
programming and calculi of discrete design. Springer NATO ASI Series, Series F: 
Computer and System Sciences, Vol. 36, 1987, 343-365 

[Scott 81] 
D. Scott: Lectures on a mathematical theory of computation. In: Theoretical Foundations of 
Programming Methodology, edited by M. Broy and G. Schmidt. D. Reidel Publishing 
Company, 1982, pp. 145 - 292 


Completion Is an Instance of Abstract Canonical 
System Inference 
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Abstract. Abstract canonical systems and inference (ACSI) were intro- 
duced to formalize the intuitive notions of good proof and good inference 
appearing typically in first-order logic or in Knuth-Bendix like comple- 
tion procedures. 

Since this abstract framework is intended to be generic, it is of funda- 
mental interest to show its adequacy to represent the main systems of 
interest. This has been done for ground completion (where all equational 
axioms are ground) but was still an open question for the general com- 
pletion process. 

By showing that the standard completion is an instance of the ACSI 
framework we close the question. For this purpose, two proof represen- 
tations, proof terms and proofs by replacement, are compared to built 
a proof ordering that provides an instantiation adapted to the abstract 
canonical system framework. 


Classification: Logic in computer science, rewriting and deduction, 
completion, good proof, proof representation, canonicity. 


1 Introduction 


The notion of good proof is central in mathematics and crucial when mecha- 
nizing deduction, in particular for defining useful and efficient tactics in proof 
assistant and theorem provers. Motivated on one hand by this quest for good 
proof theory and on the other by the profound similarities between many proof 
search approaches, N. Dershowitz and C. Kirchner proposed in [17 [18] a general 
framework based on ordering the set of proofs. In this context the best proofs 
are simply the minimal one. Once one has defined what the best proofs are by 
the mean of a proof ordering, the next step is to obtain the best presentation of 
a theory, i.e. the set of axioms necessary for obtaining the best proofs for all the 
theory, but not containing anything useless. 

To formalize this, the notion of good inference was introduced by M.P. 
Bonacina and N. Dershowitz [6]. Given a theory, its canonical presentation is 
defined as the set of the axioms needed to obtain the minimal proofs. It is gen- 
eral enough to produce all best proofs, leading to a notion of saturation, but 
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it does not contain any redundant informations, hence the notion of contrac- 
tion. Presentations, i.e. sets of axioms, are then transformed using appropriate 
deduction mechanisms to produce this canonical presentation. 

This leaded to the Abstract Canonical Systems and Inference (ACSI) generic 
framework presented in [I8] [6]. 

The ACSI framework got its sources of inspiration from three related points. 
First, the early works on Proof orderings as introduced in [3] and [4] to prove the 
completeness of completion procedures a la Knuth-Bendix. Second, the devel- 
opments about redundancy [24] [5] to focus on the important axioms to perform 
further inferences. Last but not least, by the completion procedure [31], central 
in most theorem proving tools where an equality predicate is used. This proce- 
dure has been refined, mainly for two purposes: to have a more specific and thus 
more efficient algorithm when dealing with particular cases, or to increase the 
efficiency although remaining general. For the first case, a revue of specific com- 
pletion procedures for specific algebraic structures can be found in [33]. For the 
second case, completion has been extended to equational completion [25] [36] [28]; 
inductionless induction, initiated by J.A. Goguen [2I] and D. Musser [35]; and 
ordered completion [32] (24) [4], to mention only a few. One important applica- 
tion of the completion procedure is rewrite based programming, either based on 
matching or on unification. The seminal work of J.A. Goguen on OBJ and its 
various incarnations plays a preeminent role in this class of algebraic lan- 
guages and has directly inspired CafeOBJ [20], ELAN [8] or Maude [14]. When 
the operational semantics of the language is based on unification, we find logic 
programming languages of the Prolog family, where EQLOG [23] is also a pre- 
eminent figure. Good syntheses about completion based rewrite programs can 
be found in E]. 

Several works intend to uniform this different completion procedures, and to 
make it a special case of a more general process. The notion of critical-pair com- 
pletion procedure was introduced by [I0] and covers not only standard comple- 
tion, but also Buchberger algorithm for Gröbner basis [9] [42] and resolution [37]. 
Indeed, R. Bündgen shown that Buchberger’s algorithm can be simulated by 
standard completion [II]. This concept of critical-pair completion was categori- 
cally formalized by K. Stokkermans [40]. Other generalizations can be found in 
works of M. Schorlemmer [39], M. Aiguier and D. Bahrami [I] or in the PhD of G. 
Struth [41], where standard completion, Buchberger’s algorithm and resolution 
are shown to be special instantiation of a non-symmetric completion procedure. 

But, even if initially motivated by these three points, the ACSI framework 
has been developed as a full stand alone theory. This theory provide important 
abstract results based on basic hypothesis on proofs and a few postulates. 

Therefore, a main question remains: is this framework indeed useful? Does 
this theory allows to uniformly understand and prove the main properties of a 
proof system, centered around the appropriate ordering on proofs? 

At the price of a slight generalization of two postulates, it is shown in [12], 
that good proofs in natural deduction are indeed the cut free proofs as soon as 
proofs are compared using the ordering induced by beta reduction over the sim- 
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ply typed lambda-terms. For ground completion, the adequacy of the framework 
has been shown in [16], leaving the more general question of standard completion 
open. 

This paper proves the adequacy to the framework for the standard completion 
procedure, generalizing in a non trivial way the result of [16] and showing the 
usefulness of abstract canonical systems. This brings serious hopes that the ACSI 
framework is indeed well adapted and useful to uniformly understand and work 
with other algorithms, in particular all the ones based on critical-pair completion. 


The next section will summarize the framework of abstract canonical systems, 
as defined in [18] [6], and briefly recall the standard completion. Section [3] deals 
with two representations of proofs in equational logic, namely as proof terms in 
the rewriting logic [34], and as proof by replacement [3]. We will show how to 
combine them to keep the tree structure of the first one, and the ordering associ- 
ated with the second one, which is well adapted to prove the completeness of the 
standard completion. Finally, in Section [4] we will apply the abstract canonical 
systems framework to this proof representation to show the completeness of the 
standard completion. The proofs details are given in the Appendix. 


2 Presentation 


2.1 Abstract Canonical Systems 


The results in this section are extracted from [18] [6], which should be consulted 
for motivations, details and proofs. 

Let A be the set of all formulee over some fixed vocabulary. Let P be the set 
of all proofs. These sets are linked by two functions: [-]?” : P — 24 gives the 
premises in a proof, and []c : P — A gives its conclusion. Both are extended to 
sets of proofs in the usual fashion. The set of proofs built using assumptions in 
AC A is noted by] 


PIA) = {peEP: [p] CA} . 


The framework proposed here is predicated on two well-founded partial 
orderings over P: a proof ordering > and a subproof relation >. They are 
related by a monotonicity requirement (postulate [E). We assume for conve- 
nience that the proof ordering only compares proofs with the same conclusion 
(p >q => [pla = lq]cı), rather than mention this condition each time we have 
cause to compare proofs. 

We will use the term presentation to mean a set of formulze, and justifica- 
tion to mean a set of proofs. We reserve the term theory for deductively closed 
presentations: 


ThA + [PA]a = {ipla:peP, [p] CA} . 


L. iat 
3 = is used for definitions. 
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Theories are monotonic: 


Proposition 1 (Monotonicity). For all presentations A and B: 
ACB=>ThAC ThB 


Presentations A and B are equivalent (A = B) if their theories are identical: 
Th A = Th B. In addition to this, we assume the two following postulates: 


Postulate A (Reflexivity). For all presentations A: 


ACThA 


Postulate B (Closure). For all presentations A: 
ThThAC ThA 


We call a proof trivial when it proves only its unique assumption and has no 
subproofs other than itself, that is, if [p]?™ = {[p]cı} and p © q > p = q, where 
© is the reflexive closure of the subproof ordering >. We denote by @ such a 
trivial proof of a € A and by A the set of trivial proofs of each a € A. 

We assume that proofs use their assumptions (postulate [C), that subproofs 
don’t use non-existent assumptions (postulate [D), and that proof orderings are 
monotonic with respect to subproofs (postulate [E): 


Postulate C (Trivia). For all proofs p and formule a: 


a € [p] > pba 


Postulate D (Subproofs Premises Monotonicity). For all proofs p and q: 


p&q= [p] 2 q] 


Postulate E (Replacement). For all proofs p, q and r: 





p>q>r => Wwe Pf? Ulr}?’™). p> ver 


We make no other assumptions regarding proofs or their structure. As remarked 
in [6], the subproof relation essentially defines a tree structure over proof: a 
“leaf” is a proof with no subproofs but itself, and direct subproofs, i.e. subproofs 
that are not subproofs of another subproof, can be considered as “subtrees”. 
These trees can be infinitely branching, but their height is finite because of the 
wellfoundedness of œ. 

The proof ordering > is lifted to an ordering X over presentations: 


Az BifA = BandVpe Pf(A) 3q € Pf(B).p>¢. 





Completion Is an Instance of Abstract Canonical System Inference 501 


We define what a normal-form proof is, i.e. one of the minimal proofs of 
Pf(Th A): 


Nf(A) = pPf(ThA) = {pe Pf(ThA) : ~3q € Pf(ThA).p>q} . 





The canonical presentation contains those formule that appear as assump- 
tions of normal-form proofs: 


At E [NA]. 


So, we will say that A is canonical if A = AË. 
A presentation A is saturated if it supports all possible normal form proofs: 


Pf(A) 2 Nf(A) . 


The set of all redundant formulæ of a given presentation A will be denoted 
as follows: 
RedA = freA: AZ A\{r}} . 


and a presentation A is contracted if 
Red A=9Q . 
The following main result can then be derived [I7]: 
Theorem 1. A presentation is canonical iff it is saturated and contracted. 


We now consider inference and deduction mechanisms. A deduction mecha- 
nism ~ is a function from presentations to presentations and we call the relation 


A~ B a deduction step. A sequence of presentations Ag ~ A, ~œ --- is called 
a derivation. The result of the derivation is, as usual, its persisting formule: 
! ne 
As = minty = U (\4i i 
j>0w>y 


A deduction mechanism ~ is sound if A ~> B implies Th B C ThA. It is 
adequate if A ~> B implies Th A C Th B. It is good if proofs only get better: 


A derivation Ag ~ Ay ~œ --- is good if A; = Aj41 for all i. 
We now extend the notion of saturation and contraction to derivation: 


— A derivation {A;}; is saturating if A. is saturated. 
— It is contracting if Ax is contracted. 
— It is canonical if both saturating and contracting. 


A canonical derivation can be used to build the canonical presentation of the 
initial presentation: 


Theorem 2. A good derivation is canonical if and only if 


Aco = Ab. 
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2.2 The Standard Completion 


The standard completion algorithm was first introduced by Knuth and Bendix 
in [BI], hence the name it is often called. Its correctness was first shown by Huet 
in [26], using a fairness hypothesis. We use here a presentation of this algorithm 
as inference rules (see Fig. [I), as can be found in [3]. For basics on rewritings 
and completions, we refer to [2] B9]. 

The Knuth-Bendix algorithm consists of 6 rules which apply to a couple E, R 
of a set of equational axioms and a set of rewriting rules. It takes a reduction 
ordering >> over terms as argument. The rules are presented in Fig. [I] 


Deduce: If (s,t) is a critical pair of R 
E,R ~ EU{s=t},R 


Orient: If s >t 
EU{s=t},R ~ E,RU{s—t} 


Delete: 
EU{s=s},R ~ E,R 


Simplify: If s—>u 
EU{s=t}, R EU{u=t},R 
Compose: If tou 


E,RU{s >t} E, RU {s > u} 


Collapsd4 If s Sete and s > v, 


v>we 


E,RU{s > t} EU{u=t}, R 





Fig. 1. Standard Completion Inference Rules. 


“ p» designate the encompassment ordering, s > t if a subterm of s in an instance of t 
but not vice versa. 


Since , standard completion is associated with a fairness assumption (see 
[B] Lemma 2.8]): at the limit, all equations are oriented (Es = @) and all per- 
sistent critical pairs coming from Rə are treated by Deduce at least once. 
Because we work with terms with variables, the reduction ordering >> cannot 
be total, so that Orient may fail. Therefore, the standard completion algorithm 
may either: 
— terminate with success and yield a terminating, confluent set of rules; 


— terminate with failure; or 
— not terminate. 
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Here, the completeness of the standard completion will only be shown using the 
ACSI framework for the first case. 


3 Proof Representations 


Our goal is now to use the ACSI framework to directly show that standard 
completion inference rules are correct and complete. We have therefore first to 
find the right order on proofs. We have two main choices that we are now defining 
and relating. 


3.1 Proof Terms 


Let us first consider the proof representation coming from the one used in rewrit- 
ing logic (introduced by Meseguer [34], see also [80]). Consider a signature X, 
and a set of variable V. The set of terms built upon these signature and vari- 
ables is noted 7 (X, V). Consider also a set of equational axioms F and a set of 
rewrite rules R based on this signature. To simplify the notations of proof terms, 
equational axioms and rewrite rules are represented by labels not appearing in 
the signature X. An equational axiom or a rewrite rule (J,r) € EUR will be 
also noted (I(a1,...,%n),7(%1,---,%n)) where 21,...,%p are the free variables 
of both sides. We consider the rules of the equational logic given in the Fig. B] 
These inference rules define the proof term associated with a proof. The notation 
t : t — t' means that 7 is a proof term—that could also be seen as a trace— 
showing that the term t can be rewritten to the term t. 

By definition, T(2’, V) is plunged into the proof terms when they are formed 
with the rules Reflexivity and Congruence. Also, Reflexivity for t — t is 
not essential because it can be replaced by a tree of Congruence isomorph to 
t. The proof terms associated are furthermore the same in both case: t. Notice 
that these proof terms are a restricted form of rho-terms [13]. 


Example 1. Consider the rewrite rules and equational axiom 


lı : g(a)—-d(a), fg:s=t, l3:1l—r, 


— risa proof term of r=r, 
— f(& (l2), (4357) ~*) is a proof term of f(g(s),r) = f(d(t), 1). 


Some proof terms defined here are “essentially the same”. For instance, the 
transitivity operator should be considered as associative, so that the proofs 
(T1; T2); 73 and 71; (72; 73) are equal. This can be done by quotienting the proof 
terms algebra by the congruence rules of Fig.[3] In particular, in proof terms, par- 
allel rewriting can be combined in one term without transitivity. The Parallel 
Moves Lemma equivalence corresponds to the fact that this parallel rewriting 
can be decomposed by applying first the outermost rule, then the innermost, or 
conversely. (About the Parallel Moves Lemma, see for instance [27].) 
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Reflexivity: 
t:t—t 


Congruence: 
M1: ti —t} eek Toy = tn— ti, 


f(T,- Tn) : f(t1,---;tn) f(t, ---, th) 


Replacement: For all rules or equational axioms 
L= (g(#1,.--, 8n), d(a1,.-.,Un)) E FUR, 


mı :ty—t} Lian Tn 1 tn—ti, 


LT,- Tn) : g(t1,---,;tn)—d(t4,..., th.) 


Tı :ti1—t2 Te: te—t3 
T1; T2 : t1 —t3 


Symmetry: ; ; 
T: ly —>2 


wt: tea—t1 





Fig. 2. Inference Rules for Equational Logic 


Example 2. From the rules Associativity, Identities and Inverse we 
can deduce that the proofs (m1;72)~! and m5';7,' are equivalent: 
(misma) > = (rime) yt 
(m1; m2) 7t; m1; TL 
(T1; m2) t; m1; 5 TI 
(T1; m2) 7}; T1; n2; n3; TI 
fom, in 

Ta TI 


il 


1 


1 


We similarly have f(m,...,%)~! equivalent to f(7y',...,7,1), because 
= Fy et eT Higa deg tg) 

FL iT KO tga ee d e) 

(Gq hay i Ga) as ee) 


a 
y 
ms 
pa 
R] 
3 | 
t 
l 


Il 


= Comet cee nat; nn); f(T- -3 nn)! 
= feest h] ig 2 a 
= f(m,...,%n) l . 


3.2 Proofs by Replacement of Equal by Equal 


This proof representation was introduced by [3] to prove the completeness of the 
Knuth-Bendix completion algorithm, using an ordering over such proofs that 
decreases for every completion step. 
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Associativity: For all proof terms 71, 72,73, 





T1; (T2; T3) = (mı; T2); T3 


Identities: For all proof terms 7 : t—t', 





Preservation of Composition: For all proof terms 7™,...,7n,71,---,7n, for all 
function symbols f, 





f(T; Ti,- Nnnn) = f(m,..., an); f(m,-.., Hn) 


Parallel Moves Lemma: For all rewrite rules or equational axiom @ = 
(g(£1,..., £n), d(£1,...,£n)) E EUR, for all proof terms mı : ti —t},...,7n : 


l(m1,..., Tn) l(ti,...,tn);d(m,... 
g(m1,---,7n); llth,- 


Inverse: For all proof terms 7 : t—t', 





Fig. 3. Equivalence of Proof Terms 


An equational proof step is an expression s = t where s and t are terms, e 
e 
is an equational axiom u = v, and p is a position of s such that sip = oa(u) and 
t = s[o(v)|p for some substitution ø. 
An equational proof of so = tn is any finite sequence of equational proof steps 


si ti such that t; = s;41 for alli € {0,...,n — 1}. It is noted: 
Şi i€{0,...,n} 
Po Pı Pn 
SQ — 81 — S2 -Sn — tn . 
eo €1 En 


A rewrite proof step is an expression s-ot or ts where s and ¢ are 


terms, ¢ is a rewrite rule u — v, and p is a position of s such that sip = o(u) 
and t = s[o(v)]p for some substitution ø. 
An proof by replacement (of equal by equal) of so = tn is any finite se- 


Pi 
quence of equational proof steps and rewrite proof step (s S; t) 
ti i€{0,...,.n} 


where S; E€ {——>,—>,+—} for i € {0,...,n} and such that ti = s;41 for 


alli € {0,...,n — 1}. It is noted: 








Po Pı Pn 
80 S0 $1 S1 82 `t Sn Sn tn 
o lı n 
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Example 3. Consider the rewrite rules and equational axiom: 
l : g(a) d(x), fo:s=t, b :l—r, 


— r is a proof by replacement of r = r (empty sequence), 
— f(g(s),r) > f(d(s),r) => f(d(t),r) > f(d(t), l) is a proof by replacement 
$ 2 3 
of f(g(s),r) = F(d(t), 0). 


3.3 From Proof Terms to Proofs by Replacement 


In order to have a one to one correspondence between proof representations, we 
use the equivalence of proof terms defined in Fig. [3] We can refine them to the 


proof term rewrite system ~ given in Fig. [4] in which 7,7’,71,... range over 
proof terms, t,t’, t1,... over X-terms, f,g,d over function symbols, 4 over rules 
and equational axioms labels and 7 and k over {1,...,n}. 


Delete Useless Identities: 
Tt ee 
tin 
Sequentialization: If 7, : t,t), and there exists i # j € {1,...,n} such that 
Ti # ti and Tj # tj, 
f(m,..-,7n) a f (m1, ta,..-, tn); f(t, m2,---,tn)s---3 F(C t2- -3 Tn) 


Composition Shallowing: If m; : tit; and 7; : t; —t,’, 


Pcsesmi Tiyiiastn) eA F(ti,...,78,..., tn); inepta] 
Parallel Moves: If l = (g(a1,...,%n),d(1,.--,%n)), Tı : ti—ti,... 
tn—>th, and if there exists i € {1,...,n} such that mi Æ ti, 

L(m1,...,7n) v l(t1,...,tn);d(m,...,7mn) 


Delete Useless Inverses: 


th 


~t 


Inverse Congruence: If 7; : tot: 


f(ti,..., 07 petn) > flit,- Tieta) + 


Inverse Composition: 


(m1; m2)" 





Fig. 4. Rewrite System for Proof Terms 
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The associativity is still considered in the congruence, so that all proof terms 
rewrite rules must be considered modulo the associativity of ; which will be noted 
~. The class rewrite system that we consider will be therefore noted ~> / ~. As 
it is linear, we can use the framework and results from [25]. 

We first prove that this rewrite system is included in the equivalence relation 
of Fig. B] 


Proposition 2 (Correctness). For all proof terms ™,72, if tı ~> Tma then 
Ty = 79. 


The converse is false: for instance f(¢1,2) = f(ti,€2); f(4,t) but we do 
not have f (1,02) > f(t, l2); f, t3). 


Proposition 3 (Termination and Confluence). The proof term rewrite sys- 
tem ~ modulo ~ is terminating and confluent modulo ~. 


The proof terms rewrite system ~ allow us to give a correspondence between 
proof terms and proofs by replacement of equal by equal: normal forms of proof 
terms correspond exactly to proofs by replacement. This fact is expressed in 
the following theorem, which is indeed a generalization of Lemma 3.6 in for 
equational logic. We also have operationalized the way to construct the chain of 
“one-step sequential rewrites”. 


Theorem 3 (Correspondence between Proof Representations). The 
normal form of a proof term n for the rewrite system ~~, noted nf(), has 


the following form: For some n € N, some contexts wy|],...,Wnl], some 
indices 11,...,in E {-1,1}, some rule labels ¢1,...,£, and some terms 
tiesam ad e 


f(a) = (wi [a (tis -o tm 5 -5 (Wn Ent -o tmn D 


1 


where for all proof terms v, v` is a notation for v. 


Such a proof term correspond with the following proof by replacement of equal 
by equal: 


1 iae 1 ee ee fas ji n 
walgu (tr, «> tna )] wi [di (tt, .<24tm,) 2° Sn Wnldn( Aer ae) 
1 2i N 


where for all j € {1,...,n} we have: 

— £5 = (gj, dj), 

— pj is the position of || in w;[], 

— S;=— ifij=1 and 4j € R, 
— ifi; =—1 and lj € R, 
— if GEE. 


os ; ; 41 ; 
= if j # n, wy ldg(G yu: pty) | = wjp lgj (t yea toll alk 
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Example 4. Consider 7 = f (4 (l2), (€3;r)~!) where 4 : g(x)—-d(z), fo: 8 =t, 


l3 : lL—>r, we have: 


m — f(€1(s); d(€2), (€3;r) +) (Parallel Moves) 
r f(s); d(l), r); f(d(t), (€3;r)~*) (Sequentialization) 
= f(s); d(l2), r); f(d(t), r71; 45°) (Inverse Composition) 
mr F(s); d(€2), r); f(d(t), 7; 23°) (Delete Useless Inverses) 
es F(s); d(l2), r); f(d(t), £5 ') (Delete Useless Identities) 
me F(s), r); f(d(l2), r); f(d(t), £5") (Composition Shallowing) 
mie f(s), r); f(d(é2), r); f(d(t), £3) + (Inverse Congruence) 





This last term is the normal form proof term, and it is equivalent to the proof 
by replacement f(g(s),7) -> f(d(s),r) > FAE), r)  f(d(t), D. 
1 2 3 


Due to this theorem, normal forms of proof terms can be considered in the 
following indifferently as proof terms or as proofs by replacement. 


3.4 Proofs Ordering 


The representation of Bachmair by the mean of proof by replacement was defined 
to introduce an order on proofs [8]: given a reduction ordering >>, to each single 


p 
proof steps s 5 t is associated a cost. The cost of an equational proof step s >t 
£ Uu=vU 


is the triple ({s,t},u,t). The cost of a rewrite proof step st is ({s},u,t). 
Proof steps are compared with each other according to their cost, using the lexi- 
cographic combination of the multiset mui extension of the reduction ordering 
over terms in the first component, the encompassment ordering » on the second 
component, and the reduction ordering >> on the last component. Proofs are 
compared as multisets of their proof steps. For two proofs by replacement p,q, 
we will write p >rep q if p is greater than q for such an ordering. 

Using theorem [B] we can translate Bachmair’s proof ordering to proof terms: 


Definition 1 (Bachmair’s Ordering on Proof Terms). 
For all proof terms 7,72, we say that Tı >B T2 iff 


nf(71) >rep nf(m2) . 


Example 5. Suppose we have X = {f1,a°,b°,c°} where the exponents of func- 
tion symbols denote their arity, and a precedence f >a>b>c. 

Consider mı = f(€;'; l2) and m2 = f(€3) where ¢; = a—>b, fg = a—c and 
l3 = b = c, and suppose a > b > c. 

We have nf(mı) = f(b) — f(a) —> f(c) and nf(m2) = f(b) > f(c). The 


Lı lo 3 
cost of nf(m) is {({f(a)},a, f(6)), ({f(a)},a, f(c))}, the cost of nf(m2) is 


( 
{({f (0), f(e)}, b, F(e))}; SO nf(71) >rep nf(72) and 7 >B T2. 
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As we can see, the way we define the ordering over proofs is not trivial. The 
question remains if we could have defined it more directly, without using the 
representation as proof by replacement. The following statement give a beginning 
of answer: we cannot hope to extend an RPO on X-terms to a RPO4 >rpo on 
proof terms so that >g and >rpo coincide for the normal forms of proof terms: 


Counter-erample 6. With the same hypothesis as in Example [5] let 4 = 
f(a)—c and & = b—ce. 
We now want to extend the precedence to £s and 4, in order to extend the 
RPO to proof terms. If we have £f < , f(a) si C >rep W> c but Lf<rpobb- 
f b 


If we suppose f > ff > l we have fa) ->e rép FO) -=> Fo) but 
f b 


Lf<rpof (b). 
If we suppose ls > l and £p > f, then f(f(b)) T> fO) >rep fla) ->c 
b f 
but f(f(ls))<rpoly- 
Such an extension is therefore impossible, there is no extension of >rpo on 


proof terms such that for all proof terms 7, 72, we have nf(71)>,ponf(72) if and 
only if nf(m1) >g nf(72). 


In other words, the ordering we defined above can not be defined as a RPO over 
proof terms. 


In the following, proofs will be represented by proof terms, the proof ordering 
> between them will be the ordering >g restricted to proofs with the same 
conclusion, and the subproof relation > will be the subterm relation. 


4 Standard Completion Is an Instance of Abstract 
Canonical System 


4.1 Adequacy to the Postulates 


Adequacy to postulates [A] and [D] comes from the tree structure of the 
proof terms representation. 

Postulate[E]is not trivially verified, because of the definition of the ordering 
as translation of an ordering over proof by replacement. Nevertheless: 


Theorem 4 (Postulate[E]for Equational Proofs). For all contexts wl], for 
all proof terms q,r: 
q >r implies w[q] > wjr] . 


The deduction mechanism ~ used here will be of course the standard com- 
pletion. We now show that it has the required properties. 


t Or better an ordering compatible with associativity, such as the AC-RPO [38]. 
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4.2 Standard Completion Is Sound and Adequate 

This is shown in [B] Lemma 2.1]: if E, R~ E’, R’, then > and > are the 
EUR E'UR! 


same. To prove this, one has simply to verify it for each inference rule of standard 
completion. 


4.3 Standard Completion Is Good 


This is shown in [8] Lemma 2.5, 2.6]: if Æ, R ~ E’, R’, then proofs in E, R can 
be transformed to proofs in F’, R’ using following rules: 

















Set > s E (Orient) 
t t Simplif 
se H 2 ( plify) 
ses > s (Delete) 
s— u t > set (Deduce) 
A Pan 
s— u t => s v— t 
R R R R 
t 8 u—t Compose 
smt ee es (Compose) 
t t Collapse 
$- CB ( pse) 


We have —> C>, so these proofs become indeed better. 


4.4 Standard Completion Is Canonical 
We can now show the following theorem: 


Theorem 5 (Completeness of Standard Completion). Standard comple- 
tion results—at the limit, when it terminates without failure—in the canonical, 
Church-Rosser basis. 


Proof. We can show Ro = Eh, and because standard completion is good we can 
use Theorem [2] 


Remark 1. When standard completion does not terminate, we can show that 
Ei = Ri, C Ræ. Consequently, the resulting set Roo is then saturated, but it is 
not necessarily contracted. 


This shows that the standard completion is an instance of the framework of 
the abstract canonical systems, when we choose the convenient proof represen- 
tation. 
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5 Conclusion 


We presented a proof that standard completion can be seen as an instance of 
the abstract canonical systems and inference framework. This led us to make 
precise the relation between different equational proof representations. The first 
one, proof terms as presented in [34], is convenient to consider proofs as terms, 
with a subterm relation and substitutions. The other one, initiated in [3], is well 
adapted to the study of the completeness of the standard completion procedure. 
We presented a way to pass from one representation to another by the mean of 
the proof term rewrite rules presented in Fig. [4| Thanks to this, we extended 
the ordering introduced with the proof by replacement to the proof terms and 
thus combine the advantages of both representations. This therefore positively 
answer to the question whether the abstract canonical systems, centered in a 
quite general way around the notion of proof ordering, are indeed the right 
framework to uniformly prove the completeness of completion. 

We plan now to understand how the results we have presented here can be 
extended to other completion procedures. Bachmair introduced another proof 
ordering to prove the completeness of the completion modulo [3], so that the 
generalization seems rather natural. We plan also to look at other kinds of de- 
duction mechanisms, such as Buchberger’s algorithm or resolution. For this, we 
may show that Struth’s non-symmetric completion [41], which subsumes both 
procedures, is also an instance of the ACSI framework. 

Furthermore, proof terms as presented by [84] [30] are specific terms of the 
rewriting calculus [I3] http: //rho.loria. fr}. The link between the completion pro- 
cedure and the sequent systems mentioned above can probably be found here 
and be related to Dowek’s work proving that confluent rewrite rules can be linked 
with Cut-free proofs of some sequent systems [19]. 
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A Proofs for Section [3] and 


A.1 From Proof Terms to Proof by Replacement 


To prove the termination of ~> / ~, we need a reduction ordering compatible with 
associativity. We consider only associativity here, although most of the existing 
works use associativity and commutativity. Therefore, we need the following 
lemma. 


Lemma 1. If AC B then > is B-compatible implies > is A-compatible. 


Proof. Just notice that s’ Fa s> t-> t’ implies s’ Eu s> t t 


We can therefore use the AC-RPO ordering: a total AC-compatible simplifi- 
cation ordering on ground terms is defined in [88], as an extension of the RPO. 
To compare terms, they are interpreted using flattening and interpretation rules. 
As we consider here that the associative commutative symbols have the lowest 
precedence, we do not need the interpretation rules, and we will only present 
the flattening rules: terms are reduced using a set of rules 


Fliess tns Ff (Yt, --+sYr)s 215+ ++ Sm) > F Tires By Urse + Uts Zisa) 
(1) 
for all AC-symbols f with n+ m > 1 and r > 2. Such a rewrite system is 
terminating as shown in [38}. 
For all terms t, let sn f(t) denote the set of normal forms of t using rules (I). 
Given a precedence > on function symbols, let >rpo denote the recursive path 
ordering with precedence > where AC function symbols have multiset status and 
other symbols have lexicographic status. 
If f(s1,..., Sn) is the normal form of a term s rewriting by (L) only at topmost 


position, then tf(s) = ($1524 3h)s 
Definition 2 (AC-RPO). For all terms s,t, s >ac—rpo t tf: 


— Vt' € snf(t) ds’ € snf(s), s >ac-rpo t or 
— Vt' € snf(t) ds’ € snf(s), S Srp. t and tf(s) = f(s1,...,5m) and tf (t) = 
(ti,...,tn) and 
e if the head of s is AC then {s1,...,8m}>AC—rpomsp {tis ++» tn} Or 
e if the head of s is not AC then (81,...,8m)>AC—rpoje, (t1; -++5tn)- 





Proposition 4 (|38]). The AC-RPO is an AC-compatible simplification order- 
ing which is total for non AC-equivalent ground terms. 


We define a precedence > such that for all function symbols f and for all 
rule labels / we have £ > f >-~1 > ; . The AC-RPO built with this precedence 
will be noted >. 
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To show termination, we also need the following lemma: 


Lemma 2. For all proof terms 7 : t—>t', we have t = t andr >t. 


Proof. By induction on the structure of the proof term 7. 

For Reflexivity, 7 = t = t. 

For Congruence, 7 = f(T1,..., Tn), t= f(ti,..., tn) and t = fti, th). 
By induction hypothesis, for all i € {1,...,n}, we have m; > ti, t;. Further- 
more, 7 is not reducible on the top position using rules (I), so that snf(r) = 
{F(T Th) + Vi, m; © snf(ai)}, whereas t and t’ are not reducible. Conse- 
quently, by definition of an AC-RPO, m > t,t’. 

For Replacement, n = ¢(71,...,7n), t= g(ti,...,tn) and t’ = d(t4,...,t/,) 
where ¢ = (g,d) E€ EU R. With the same arguments than for Congruence, we 
can conclude that m > t,t’ (recall that £ > g,d). 

For Transitivity, n = m1; m2 where mı : t—>t” and mə : t”—>t. By in- 


duction hypothesis, 7; = t and m2 = t. As > is a simplification ordering, 
T > 11,72 = t,t’. 
For Symmetry, 7 = n! where x’ : t/—>t. By induction hypothesis and 


because > is a simplification ordering, 7 > 7’ > t,t. 


Proposition 5 (Termination). The rewrite system ~> of Fig. [4] modulo ~ is 
terminating for ground proof terms. 


Proof. We can show that ~»C>, thus proving the termination of ~ / ~: 

For Delete Useless Identities, it comes from the fact that > is a simplifi- 
cation ordering. 

For Sequentialization, rules @ are not applicable 
on the left side whereas they lead on the right side to 
; (f (m1, ta,.--, tn), FH, 72,---5tn),---, f(t, th,---,7m)). We have f >; 
thus by definition of a RPO, we must then prove that for all 
i € {1,...,n} we have f(m,...,7) >rpo f(t,...,th_1, Ti titl- tn) 
ie. (71,..-, Hn) >20 (t,.--,th_1, 7, tiga,---,tn). By hypothesis there exists 
at least a j € {1,...,n}\ {i} such that 1; Æ tj, so we can conclude with the 
preceding lemma. 

For Composition Shallowing, both sides are not reducible using rules 
(I). We have f >;, thus we have to show: f(ti,..., Ti; TL, ... tn) “Reo 
f(ti,-.-,7i,---,tn) and f(ti,...,7i37),...,tn) >rpo f(ti,...,74,---,tn). Both 
comparisons hold by definition of a RPO. 

For Parallel Moves, both sides are not reducible using rules (I). We 
have € >;, thus we have to prove that &(77,...,7) >RPO €(t1,.-.,tn) and 
l(m71,---;7n) reo d(m,.-.-,7n). The first comparison holds because of the 
lemma and because there exists a i € {1,...,n} such that m; 4 ti; the second 
one holds because £ > d. 

For Delete Useless Inverses, this comes from the fact that > is a simpli- 
fication ordering. 
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For Inverse Congruence, both sides are not reducible using rules (I), there- 
fore this is a consequence of f > -~1. 

For Inverse Composition, both sides are not reducible using rules (f), 
therefore this is a consequence of -~! >;. 


We can also prove confluence: 


Proposition 6 (Confluence). The rewrite system ~» is confluent modulo ~ 
on ground proof terms. 


Proof. The class rewrite system is linear and terminating, so we just have to 
check that the critical pairs are confluent [25]. 
For a it is easy to check for most of the critical pairs that they are 


confluent. We only detail the most problematic one. For two possible applications 
of Sequentialization, we have for instance f(g(1,...,Um),;7™71,---;7n) that 
can be rewritten to f(g(1,.--,Um),t1,---5tn); f(g(S1,---;Sm);71;---;tn)p-- 5 
f(g(s1,---;8m),t4,---,7n) and to f(g(1,.--,Sm);---3.9(84,--+-;Um);71,-+-5 Tn): 
Both of them reduce to f(g(r1,..-, Sm); ---3g9(51,---; Vm), t1; ---; tn); 
Flr 8m), Tissa tn)] enS (g(S1j 00 5-Sm)s at Ta 

For SO the only rules that can interfere with ~ are Delete Useless 
Identities, Composition Shallowing and Inverse Composition. We can 
check that all critical pairs are confluent. 


Theorem 6 (Correspondence between Proof Representations). The 
normal form of a proof term n for the rewrite system ~~, noted nf(r), has 


the following form: For some n € N, some contexts wı|],..., Wwn|], some 
indices 11,...,in E {—1,1}, some rule labels €1,...,£,. and some terms 
tisele recti 


nf(7) = (wy Halti . 5 tm, E; seg (wn[en( 1 sas sEm a 


where v` is a notation for v. 
We will denote by nf(z) the normal form of a proof term r. 


1 


Such a proof term correspond with the following proof by replacement of equal 
by equal: 


pı p2 Pn 
wilgilti -tha )] Fi wy |di(ti,...,tm,)] S2 : Fn Urs ye EY -tmn )] 
x 2 n 


where for all j € {1,...,n} we have: 


— £; = (95,45), 
— pj is the position of || in w,|], 
— ifi;=landl; E€ R, 


—~ S; =< fi; =—-1 and G5 € R, 
— if GEE. 


> if j = n, wj[d;(ti,.. T] = w+ [gi (tt ,.. S 


Completion Is an Instance of Abstract Canonical System Inference 517 


Proof. We first have to check that proof terms in that form are indeed irreducible 
by ~, what is left to the reader. 

Then, suppose that we have an irreducible proof term. Because Sequential- 
ization cannot be applied, there is at most one ; under all function symbols. 
Because Composition Shallowing cannot be applied, there are no ; under all 
function symbols. Because Inverse Congruence and Inverse Composition 
cannot be applied, -~! is applied between ; and function symbols. Irreducible 
proof term are therefore application of ; over eventually -~! over base terms 
composed of function symbols and rule labels. 

Because Delete Useless Identities and Delete Useless Inverse cannot 
be applied, there is a least one non-trivial proof (i.e a proof with a label in 
it) in each of these base terms. Because Sequentialization cannot be applied, 
there is at most one non-trivial proof in each of them. Because Parallel Moves 
cannot be applied, the subterms of the labels are X-terms. Consequently, each 
base term contains one and only one rule label, applied to -terms. 


A.2 Adequacy to the Postulates 


Postulate [Al The proof of (u,v) € EUR labeled by £ is ¢(x1,...,2%n) where 


T1,- --,Zn are the free variables of (u, v). 
Postulate[B} We can replace the assumption f(71,...,7n) of something proved 
by its proof where the free variables are replaced by the proofs 71,...,7n.- 


Postulate[G and[D} These postulates hold because of the tree structure of proofs. 


Postulate[E} This one does not trivially hold. We first show the following lemma: 


Lemma 3. For all function symbols f of arity n + 1, for all proof terms 
T1,---57n, q andr: 


q >r implies f(T1,...,q,---, nn) > f(t. 0-51, + +25 Tn) - 


Proof. Suppose q > r, thus by definition nf(q) >rep nf(r). To compare 
f(m1,---5Q-+-;7n) and f(m,...,7,.--,7mn), we have to transform them to proof 
by replacement. As —> is Church-Rosser, the way it is applied does not matter. 


we |v 
We have 
Fimea Tn) 
r Fanta. tn) Flt eee stn)i i FCs esta) 
oe Fantasta) fE DEl) rtn) i FC Tn) 


Then, if nf(q) contains ; the underlined term will be split by Composition 
Shallowing. If it contains 71 the rule Inverse Congruence will be applied. 
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Some terms outside the underline corresponding to identity will be removed by 
Delete Useless Identities, and the normal form will look like: 


Pog iad Cero oiera) eG eroen) eae 


with nf(q) = q!;...; qin. 

The same will apply with r, and therefore, to compare the initial proofs, we 
just have to compare the costs of the underlined terms. 

The cost of nf(q) will look like {({s1}, u1, hi),..-,({5m},Um; Am) }. Then the 
cost of f(th,.--,q1;---;tn)t;.--; f(t- -3 qm,- --, tn)?™® will be: 


T ced ome ee 
({f(t4,---5Sm,-+-stn)}, Um, f(t, -.-, M,...,tn)) 


For nf(r) they will be respectively {({g1}, v1, d1),---.({9p}, Up, dp) } and: 


Taen eo ame ca 
(H FG isoe ee Opes tn) by Ups Pas oes pass tn) 


>, which is used to compare the first and the third components of 
each part of the cost, is a reduction ordering, so that nf(q) >rep 
nf(r) implies for instance f(t),...,@1,---,tn)'3---3 Fl- ++. Ims- +s tn)’ >rep 
FE ership tn) eek flys es prensin) t 


The same is true for labels: 


Lemma 4. For all rule labels £, for all proof terms ™,...,7, q and r: 
q >r implies €(m71,...,4,---;In) > €(m,.--,1,---;Tn) 


Proof. €(m1,..-,4,-++;7m) and €(m1,...,7,...,7n) can be reduced by Parallel 
Moves to ¢(t1,...,tn);d(m71,.-.,q,---,; Tn) and €(ty,...,tn);d(m1,...,17,---5,7n)- 
We can therefore conclude using the preceding lemma. 


This allows us to show 


Theorem 7 (Postulate [E] for Equational Proofs). For all proof terms p,r, 
for all position i of p: 
Pii > r implies p> pir]; . 


Proof. This is proved by induction on i. For i = e this is trivial. For i Æ «€, by 
induction hypothesis, the result holds for the subproofs of p. For the head of p: 


— for Symmetry, it is trivial; 

— for Transitivity, it comes from the fact that equational proofs are compared 
as the multiset of their equational proof steps; 

for Congruence, it comes from lemma} 

— for Replacement, it comes from lemma [4] 


| 
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A.3 Standard Completion Is Canonical 
Remember that by fairness assumption, E = 9. 


Lemma 5. For all standard completion derivations (E;, Ri);: 
BE Ran 


Proof. By contradiction, suppose there is (a,b) € Ei \ Ræ, labeled £. Because 
completion is adequate, there exists p € uPf (Ræ) proving a = b. Because a = 
bE EË, L(£1,...,£n) E Nf (Eo) = Nf (Ræ) where (x;); are the free variables of 
£, so that 

p> Ulisse Bi) 


— If there are no peak in nf(p), then nf(p) is a valley proof, and it is easy to 
show that it is smaller than ¢(21,...,@,), which is a contradiction with the 
preceding comparison. 

— If there is a parallel peak, for instance s{c, e] A s{d, e] a sid, f], 
then the proof by replacement where this peak is replaced by 
s|c, e] a sic, f] = sid, f] is smaller, thus leading to a contradiction with 


$ 
the minimality of p in Pf (Ræ). 

— If there is a critical peak, then by fairness assumption there is some step k 
where this critical peak is treated by Deduce. The proof of the conclusion 
of the critical peak at the step k + 1 is therefore smaller. Because standard 
completion is good, it can only go smaller, so that at the limit we can find 
by replacement of the critical peak by this proof a smaller proof of a = b, 
thus leading to a contradiction with the minimality of p in Pf (Ræ). 


Lemma 6. For all standard completion derivations (E;, Ri); which terminate 
without failure: 
R» C E$. 


Proof. By contradiction, suppose there is (a,b) € Roo \ E}, labeled by £. Then 
there exists a proof p € Pf (EË) such that L(z1,..., £n) > p where z1,..., £n 
are the free variables of £. 

Rules comes from orientation of equational axioms through Orient, so that 
a > b. The cost of L(£1,..., £n) is then {({a},a,b)}. Consider the leftmost 
step of nf(p). It is of the form a 5 ald]; where c = ay;. If it is a ajd]; then 

(c,d) — Cc 

the cost of this proof step would be {({a[d];},d,a)}, which is then greater than 
{({a}, a, b)}, thus leading to a contradiction with the fact that L(x1,..., £n) > p. 
If a— ald]; then the cost of this proof step would be {({a, ald];}, c, a[d];)}, 
which is then greater than {({a}, a, b)}, thus leading to a contradiction with the 


fact that L(z1,..., £n) > p. Ifitisa ald]: then there is a critical pair (b, a[d];) 
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in Ro (we just proved that Ei C Ræ). The fairness assumption will there- 
fore apply, and therefore Deduce will produce the equational axiom b = ald], 
which will be oriented, and a—>b € R will be simplified through Compose 
or Collapse. Because a—>b is persisting, it must be generated once again, thus 


contradicting the termination of the completion. 


Theorem 8 (Completeness of Standard Completion). Standard comple- 
tion results — at the limit, when it terminates without failure — in the canonical, 
Church-Rosser basis. 


Proof. There is nothing more to prove, because we have Ræ = Ei, and standard 
completion is good so we can use Theorem P} 
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Abstract. This paper gives a reduction-preserving translation from Co- 
quand’s dependent pattern matching [4] into a traditional type theory [IT] 
with universes, inductive types and relations and the axiom K [22]. 
This translation serves as a proof of termination for structurally recur- 
sive pattern matching programs, provides an implementable compilation 
technique in the style of functional programming languages, and demon- 
strates the equivalence with a more easily understood type theory. 


Dedicated to Professor Joseph Goguen on the occasion of his 65th birthday. 


1 Introduction 


Pattern matching is a long-established notation in functional programming [SIE9], 
combining discrimination on constructors and selection of their arguments safely, 
compactly and efficiently. Extended to dependent types by Coquand [4], pattern 
matching becomes still more powerful, managing more complexity as we move 
from simple inductive datatypes, like Nat defined as follows, 


Nat : x = zero: Nat | suc(n:Nat) : Nat 


to work with inductive families of datatypes [6] like Fin, which is indexed over 
Nat (Fin n is an n element enumeration), or Fin’s ordering relation, <, indexed 
over indexed data 


Fin (n: Nat) : * = : Fin (suc n) 
| fs,(i:Fin n) : Fin (suc n) 
(i:Finn) <, G: Finn) : * = leqzn,; Zn Suen) I 


| lédSn:i:y (pi Sp j) ifsa i <(sucn) fSn J 
Pattern matching can make programs and proofs defined over such structures 


just as simple as for their simply-typed analogues. For example, the proof of 
transitivity for < works just the same for Fin as for Nat: 


trans (pi <jj;q:9 <k) i: i<k 
trans leqZn:; q b> leqZn:k 
trans (leqsn;i/.;' p’) (leqsn:;":4/ g) > leqSn;i7;4/ (trans p’ q’) 





t Here we write as subscripts arguments which are usually inferrable; informally, and 
in practice, we omit them entirely. 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 521-540] 2006. 
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There is no such luxury in a traditional type theory [14]20], where a datatype 
is equipped only with an elimination constant whose type expresses its induction 
principle and whose operational behaviour is primitive recursion. This paper 
provides a translation from dependent pattern matching in Coquand’s sense to 
such a type theory—Luo’s UTT [I], extended with the Altenkirch-Streicher K 
axiom ‘uniqueness of identity proofs’ [22]. Coquand observed that his rules admit 
K; Hofmann and Streicher have shown that K does not follow from the usual 
induction principle for the identity relation [9]. We show that (a variant of) K is 
sufficient to bridge the gap: it lets us encode the constructor-based unification 
which Coquand built directly into his rules. 


Our translation here deploys similar techniques to those in [18], but we now 
ensure both that the translated pattern matching equations hold as reductions 
in the target theory and that the equations can be given a conventional oper- 
ational semantics [I] directly, preserving termination and confluence. By doing 
so, we justify pattern matching as a language construct, in the style of ALF [13], 
without compromising the role of the elimination constant in characterising the 
meaning of data. 


An early approximant of our translation was added to the LEGO system [12] 
and demonstrated at ‘Types 1998’. To date, McBride’s thesis [I5] is the only 
account of it, but there the treatment of the empty program is unsatisfying, 
the computational behaviour is verified only up to conversion, and the issue of 
unmatched but trusted terms in pattern matching rules is barely considered. 


Our recent work describes the key equipment. The account of elimination 
in uses a heterogeneous equality to express unification constraints over de- 
pendently typed data. Hence where Coquand’s pattern matching invokes an 
external notion of unification and of structural recursion, we have built the tools 
we need within type theory [I7]. Now, finally, we assemble these components to 
perform dependent pattern matching by elimination. 


Overview The rest of the paper is organised as follows. Section B] ex- 
amines pattern matching with dependent types, and develops basic definitions, 
including that of specialisation in patterns, as well as the programs which will 
eventually be translatable to type theory. The key technical definition here is 
that of splitting tree; novel here is the recording of explicit evidence for impos- 
sible case branches. Section B]describes the target type theory. This is extended 
by function symbols with defining equations which determine reduction rules, 
subject to certain conditions. The allowable such function definitions arise from 
the existence of valid splitting trees. Finally, Section [4| shows how such func- 
tion definitions may be eliminated in favour of closed terms in the type theory 
with the same reduction behaviour; the valid splitting trees precisely correspond 
to the terms built from constructor case analysis and structural recursion on 
inductive families, modulo the heterogeneous equality Eq. 
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2 Dependent Pattern Matching 


Let us first take a look at what dependent pattern matching is, and why it is 
a more subtle notion than its simply typed counterpart. Inductive families gain 
their precision from the way their constructors have specialised return types. For 
example, the constructors of Fin can only make elements of sets whose ‘size’ is 
non-zero. Consider writing some function p (i: Nat; x : Fin i) : ---. Trying to 
match on x without instantiating 7 is an error. Rather, one must take account 
of the fact that i is sure to be a suc, if p is to typecheck: 


bY pifz : Nat p(sucj)fz =- 
-7 pi(fsy) : Nat p (suc j) (fsy)= 


Of course, there need not be any actual check at run time whether these (suc j) 
patterns match—the type system guarantees that they must if the patterns for 
x do. This is not merely a convenient optimisation, it is a new and necessary 
phenomenon to consider. For example, we may define the property of ‘being in 
the image of f’ for some fixed f : S — T, then equip f with an ‘inverse’: 


Imf (t:T) : x= imf (s: S) : Imf (£ s) inv (t:T;p:Imft): S 
inv (f s) (imf s) => s 


The typing rules force us to write (f s) for t, but there is no way in general that 
we can compute s from t by inverting f. Of course, we actually get s from the 
constructor pattern (imf s) for p, together with a guarantee that t is (£ s). 

We have lost the ability to consider patterns for each argument independently. 
Moreover, we have lost the distinction of patterns as the sub-language of terms 
consisting only of the linear constructor forms, and with this, the interpretation 
of defining equations as rewriting rules is insufficient. It is not enough just to 
assign dependent types to conventional programs: specialised patterns change 
what programs can be. 

Let us adapt to these new circumstances, and gain from specialisation, ex- 
ploiting the information it delivers ‘for free’. For example, in a fully decorated 
version of the step case of the above definition of the trans function, 


transS(suc n); (fSn i); (fSn j); (fSn k) (leqSn;i;j p’) (leqsn.j;% q’) bK 
leqSn:i:k (tranSn;i;j;k p’ q’) 


it is precisely specialisation that ensures the p’ and q’ are not arbitrary < proofs, 
but rather appropriate ones, which justify the recursive call to trans. Meanwhile, 
we need not analyse the case 


tae 7 transS(suc n)3(fSn t)? A (leqSn;i;j p’) leqZn:k visi < (suc n) k 


because the two proof patterns demand incompatible specialisations of the mid- 
dle value upon which they must agree. In general, specialisation is given by the 
most general unifier for the type of the value being analysed and the type of the 
pattern used to match it. Later, we shall be precise about how this works, but 
let us first sketch how we address its consequences. 
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2.1 Patterns with Inaccessible Terms 


The key to recovering an operational interpretation for these defining equations 
is to find the distinction between those parts which require constructor matching, 
and those which merely report specialisation. We shall show how to translate the 
terms on the left-hand sides of definitional equations written by the programmer 
into patterns which, following Brady [2], augment the usual linear constructor 
forms with a representation for the arbitrary terms reported by specialisation 
and presupposed to match. 


Definition 1 (Patterns) 


pat := T e= g AV(z) => {g} 
| c pat* [c p= c [pl av(c p) => AV(p) 
| term t= t Av(t) = 0 


lhs := f pat” |f p= f|p] av(f p) = avi) 


We say the terms marked t are inaccessible to the matcher and may not bind 
variables. The partial map AV(—) computes the set of accessible variables, where 
AV(p) is the disjoint union, J; AV(pi), hence Av(—) is defined only for linear 
patterns. The map [—] takes patterns back to terms. 

We can now make sense of our inv function: its left-hand side becomes 


inv (f s) (ims) 


Matching for these patterns is quite normal, with inaccessible terms behaving like 
‘don’t care’ patterns, although our typing rules will always ensure that there is 
actually no choice! We define MATCH to be a partial operation yielding a match- 
ing substitution, throwing a CONFLICT exception} or failing to make progress 
only in the case of non-canonical values in a nonempty context. 


Definition 2 (Matching) Matching is given as follows: 


MATCH(z, t) => [tr t] 
MATCH(chalk j, chalk £) => MATCHES(Ø, t) 
MATCH(chalk p, cheeset) {t CONFLICT 
MATCH(u, t) => € 


MATCHES(€, €) = > € 
MATCHES(p; p, t; t) — MATCH(p, t); MATCHES(f, t) 


So, although definitional equations are not admissible as rewriting rules just 
as they stand, we can still equip them with an operational model which relies 
only on constructor discrimination. This much, at least, remains as ever it was. 

Before we move on, let us establish a little equipment for working with pat- 
terns. In our discussion, we write p[z] to stand for p with an accessible x ab- 
stracted. We may thus form the instantiation p[p’] if p’ is a pattern with variables 


5 We take chalk and cheese to stand for an arbitrary pair of distinct constructors. 


Eliminating Dependent Pattern Matching 525 


disjoint from those free in p[—], pasting p’ for the accessible occurrence of x and 
[p'] for the inaccessible copies. In particular, p[c 7] is a pattern, given fresh 7. 
Meanwhile, we shall need to apply specialising substitutions to patterns: 


Definition 3 (Pattern Specialisation) If o is a substitution from variables 
A to terms over A’ with AV(p) = AW A’ (making o idempotent), we define the 
specialisation op, lifting o to patterns recursively as follows: 


or => or if rea alep) => cop ot => gt 
or =r if red 


Observe that AV(ap) = A’. 


Specialisations, being computed by unification, naturally turn out to be idem- 
potent. Their effect on a pattern variable is thus either to retain its accessibility 
or to eliminate it entirely, replacing it with an inaccessible term. Crucially, spe- 
cialisation preserves the availability of a matching semantics despite apparently 
introducing nonlinearity and non-constructor forms. 


2.2 Program Recognition 


The problem we address in this paper is to recognize programs as total functions 
in UTT+K. Naturally, we cannot hope to decide whether it is possible to con- 
struct a functional value exhaustively specified by a set of arbitrary equations. 
What we can do is fix a recognizable and total fragment of those programs whose 
case analysis can be expressed as a splitting tree of constructor discriminations 
and whose recursive calls are on structurally decreasing arguments. 

The idea is to start with a candidate left-hand side whose patterns are just 
variables and to grow a partition by analysing a succession of pattern variables 
into constructor cases. This not only gives us an efficient compilation in the style 
of Augustsson [I], it will also structure our translation, with each node mapping 
to the invocation of an eliminator. Informally, for trans, we build the tree 


trans p q 
trans leqz q + leqz 


trans (leqs p’) q 
translegs-p4leqz 


: {sy =k 
ae o (leqs p’) (leqs q') ++ leqs (trans p’ q’) 


The program just gives the leaves of this tree: finding the whole tree guaran- 
tees that it partitions the possible input. The recursion reduces the size of one 
argument (both, in fact, but one is enough), so the function is total. 

However, if we take a ‘program’ just to be a set of definitional equations, even 
this recognition problem is well known to be undecidable M521]. The difficulty 
for the recognizer is the advantage for the programmer: specialisation can prune 
the tree! Above, we can see that q must be split to account for (leqs q’), and 
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having split g, we can confirm that no leqz case is possible. But consider the 
signature empty (i:Fin zero) : X. We have the splitting tree: 
empty t 
i : Fin zero va 


If we record only the leaves of the tree for which we return values, we shall 
not give the recognizer much to work from! More generally, it is possible to 
have arbitrarily large splitting trees with no surviving leaves—it is the need to 
recover these trees from thin air that makes the recognition of equation sets 
undecidable. Equations are insufficient to define dependently typed functions, 
so we had better allow our programs to consist of something more. We extend 
the usual notion of program to allow clauses f E h xz which refute a pattern 
variable, requiring that splitting it leaves no children. For example, we write 


empty (i:Fin zero) 
empty i ħi 


We now give the syntax for programs and splitting trees. 


Definition 4 (Program, Splitting Tree) 


program := f (context) : term splitting := compRule 
clauset | [context] lhs 
clause := f term* rhs x { splittingT 
rhs := + term compRule := [context] lhs rhs 
| ha 


We say that a splitting tree solves the programming problem [A] f p, if these are 
the context and left-hand side at its root node. Every such programming problem 
must satisfy AV(p) = A, ensuring that every variable is accessible. 


To recognize a program with clauses {f bi ri | 0 <i < n} is to finda 
valid splitting tree with computation rules {[4;] f Pi r: | 0 < i < n} such that 
[£ p;|= £ t; and to check the guardedness of the recursion. We defer the precise 
notion of ‘valid’ until we have introduced the type system formally, but it will 
certainly be the case that if an internal node has left-hand side f plx], then 
its children (numbering at least one) have left-hand sides f op{c y] where c is 
a constructor and ø is the specialising substitution which unifies the datatype 
indices of x and cy. 

We fix unification to be first-order with datatype constructors as the rigid 
symbols [10|—we have systematically shown constructors to be injective and 
disjoint, and that inductive families do not admit cyclic terms [I7]. Accordingly, 
we have a terminating unification procedure for two vectors of terms which will 
either succeed positively (yielding a specialising substitution), succeed negatively 
(establishing a constructor conflict or cyclic equation), or fail because the prob- 
lem is too hard. Success is guaranteed if the indices are in constructor form. 
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We can thus determine if a given left-hand side may be split at a given 
pattern variable—we require all the index unifications to succeed—and generate 
specialised children for those which succeed positively. We now have: 


Lemma 5 (Decidable Coverage) Given f (7:9) : T; {fir |0<i<n}, 


it is decidable whether there exists a splitting tree, with root |%:S]fzZ:T and 
computation rules {[A;] £ P; ri | 0 <i < n} such that [£ Pi] = £ t. 


Proof The total number of constructor symbols in the subproblems of a split- 
ting node strictly exceeds those in the node’s problem. We may thus generate all 
candidate splitting trees whose leaves bear at most the number of constructors 
in the program clauses and test if any yield the program. 














Coquand’s specification of a covering set of patterns requires the construction 
of a splitting tree: if we can find a covering for a given set of equations, we may 
read off one of our programs by turning the childless nodes into refutations. 
As far as recursion checking is concerned, we may give a criterion a little more 
generous than Coquand’s original M]. 


Definition 6 (Guardedness, Structural Recursion) We define the binary 
relation <, ‘is guarded by’, inductively on the syntax of terms: 


cic Jst rpg axi 
reren fs<t rat 


tp San TH 
We say that a program £(#:S) : T; {ffjr; | 0 <i <n} is structurally recursive 
if, for some argument position j, we have that every recursive call f § which is a 
subterm of some r; satisfies sj < tij. 


It is clearly decidable whether a program is structurally recursive in this 
sense. Unlike Coquand, we do permit one recursive call within the argument of 
another, although this distinction is merely one of convenience. We could readily 
extend this criterion to cover lexicographic descent on a number of arguments, 
but this too is cosmetic. Working in a higher-order setting, we can express the 
likes of Ackermann’s function, which stand beyond first-order primitive recur- 
sion. Of course, the interpreter for our own language is beyond it. 


3 Type Theory and Pattern Matching 


We start from a predicative subsystem of Luo’s UTT [II], with rules of infer- 
ence given in fig. [I] UTT’s dependent types and inductive types and families are 
the foundation for dependent pattern matching. Programs with pattern match- 
ing are written over types in the base type universe xo, which we call small 
types. Eliminations over types to solve unification are written in x1, and the 
Logical-Framework-level universe 0 is used to define a convenient presentation 
of equality from the traditional |, J and K. Our construction readily extends to 
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Fig. 1. Luo’s UTT (functional core) 


the additional hierarchy of universes of full UTT. The impredicative universe of 
propositions in UTT is not relevant to explaining pattern matching through the 
primitive constructs of type theory, and so we omit it. 

We identify terms that are equivalent up to the renaming of bound variables, 
and we write [x + s]¢ for the usual capture-free substitution of s for the free 
variable x in t. 

UTT is presented through the Logical Framework, a meta-language with 
typed arities for introducing the constants and equalities that define a type the- 
ory. While the Logical Framework is essential to the foundational understanding 
of UTT, it is notationally cumbersome, and we shall hide it as much as possible. 
We shall not distinguish notationally between framework IT kinds and object- 
level TI types, nor between the framework and object representations of types. 
We justify this by observing that O represents the types in the underlying frame- 
work, and that xq and x; are universes with names of specific types within O. 
However, informality with respect to universes may lead to size issues if we are 
not careful, and we shall explicitly mention the cases where it is important to 
distinguish between the framework and object levels. 


























There is no proof of the standard metatheoretic properties for the theory 
UTT plus K that we take as our target language. Goguen’s thesis [8] establishes 
the metatheory for a sub-calculus of UTT with the Logical Framework, a single 
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universe and higher-order inductive types but not inductive families or the K 
combinator. Walukiewicz-Chrzaszcz shows that certain higher-order rewrite 
rules are terminating in the Calculus of Constructions, including inductive fam- 
ilies and the K combinator, but the rewrite rules do not include higher-order 
inductive types, and the language is not formulated in the Logical Framework. 

However, our primary interest is in justifying dependent pattern matching by 
translation to a traditional presentation of type theory, and UTT plus K serves 
this role very well. Furthermore, the extensions of additional universes, inductive 
relations and the K combinator to the language used in Goguen’s thesis would 
complicate the structure of the existing proof of strong normalization but do not 
seem to represent a likely source of non-termination. 


3.1 Telescope Notation 


We shall be describing general constructions over dependent datatypes, so we 
need some notational conveniences. We make considerable use of de Bruijn’s 
telescopes |[5|—dependent sequences of types—sharing the syntax of contexts. 
We also use Greek capitals to stand for them. We may check telescopes (and 
constrain the universe level a of the types they contain) with the following 
judgment: 

T F- valid r-S:aœa T;xz:SF A tele(a) 


TFE tele(a) TF 2:85; tele(a) 


We use vector notation ¢ to stand for sequences of terms, t1;...;tn. We identify 
the application f t;...;t, with f tı ... tn. Simultaneous substitutions from a 
telescope to a sequence are written [O +> t], or [t] if the domain is clear. Substi- 
tuting through a telescope textually yields a sequence of typings t1:7);...3tn:Tn 
which we may check by iterating the typing judgment. We write t : O for the 
sequence of typings ito, asserting that the ts may instantiate O. We also let 
T F oA assert that o is a type-correct substitution from A to I’-terms. 

We write TTA. T to iterate the T-type over a sequence of arguments, or 
A-—T if T does not depend on A. The corresponding abstraction is AA. t. We 
also let telescopes stand as the sequence of their variables, so if f : TTA. T, then 
At f A: T. The empty telescope is E, the empty sequence, €. 


3.2 Global Declarations and Definitions 


A development in our type theory consists of a global context I’ containing dec- 
larations of datatype families and their constructors, and definitions of function 
symbols. To ease our translation, we declare global identifiers g with a tele- 
scope of arguments and we demand that they are applied to a suitable sequence 
wherever they are used. Each function f(A) : T has a nonempty set of com- 
putation rules. We extend the typing and reduction rules (now contextualised) 
accordingly: 


g(@):TEr F;Att:@ 


= f E~r be if [A] £ p Ecer 
r;Atgi: [HT MA 


MATCHES(p,t) —> 0 
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We take the following at least to be basic requirements for defined functions. 


Definition 7 (Function Criteria) To extend I with f(A) :T with computa- 
tion rules {[A;] £ Pi ri | O <i <n}, we require that: 











—-PF;AtT: 0, 
— the computation rules arise as the leaves of a splitting tree solving [A] f A, 
— the corresponding program is structurally recursive, 


— ifr; is > ei, then T; A; F e; : Pi. 





We shall check basic properties of pattern matching computation shortly, but 
we first give our notion of data (and hence splitting) a firm basis. 


Definition 8 (Inductive Families) Inductive families with n > 1 construc- 
tors are checked for strict positivity and introduced globally as shown in fig. A 
We write D for the telescope Z; z:D E. 


TE tele(xo) {Tr|D(E£):xo F A; con(ŭi) | i < ny}; 
(2) :x0; {ci(Ai):D ŭi [i< ny}; 
{Mj :T1A;. HYPS(4;, BIG) > x1 | i < n}; D):x1 
[M; Ai] Ev M üi (ci Ai) œ Mi Ai RECS(4;, Ep M) | i < n}; 
p(P:D «1; {mi :TA;. HYPS(4;, LITTLE(P))— P t; (c Ai) |i <n};D): PD 
[P; m; Ai] ep P M ŭi (ci Ai) => mi A; RECS(A;, ep P m) | i< n} 
F valid 


T; D 
Ep 
{ 


S5 


o 


~ 


where BIG(_,-) => *1 LITTLE(P)(v, £) => Pv 
030;a: AFA con(i) 
: A; A con(ŭ) 

) => HYPS( A, H) 

) => RECS( A, f) 


DOF: 8 T;OFA:xo I|D(E 
I'\D(&):*0; OF © con(u LI'|D(&):*0; O 
HYPS(e,H) =e HYPS(a : p 
RECS(e, f) => € RECS(a : A; 

r 

v. 


ix 
Fa 
,H 
if 


T;O+ ® tele(xo 








®); HYPS(A, H) 
); RECS(A, f) 


SE 
Sa 
hS 
i, 

1 

Sa 


Fig. 2. Declaring inductive types with constructors 











In Luo’s presentation [II], each inductive datatype is an inhabitant of O; it is 
then given a name in the universe xo. There is a single framework-level eliminator 
whose kind is much too large for a UTT type. Our presentation is implemented 
on top: D really computes Luo’s name for the type; our UTT eliminators are 
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readily simulated by the framework-level eliminator. This definition behaves as 
usual: for Nat, we obtain 


Nat : xo; zero: Nat; suc(n:Nat): Nat; 

Enat(Z :x1; S:Nat > xı > x1; n: Nat): x1; 
[Z; S] Ena Z S zero =Z 

[Z; S; n] Enat Z S (suc n) +> Sn (Ena Z S n) 

ena(P :Nat > x1; z: P zero; s:n: Nat. P n — P (suc n); n:Nat):P n; 
[P; z; s] ena P z szero wz 

[P; z; s; n] enat P z s (suc n) > s n (enat P z 5 n) 


Given this, the Fin declaration yields the following (we suppress Erin): 


Fin(n:Nat):*9;  fz(n:Nat):Fin (suc n); fs(n:Nat; i: Fin n): Fin (suc n); 
Erin t; 
eFin(P : Tin:Nat. Fin n > x1; 
2: Te Nat. Pen) (fzr); 84 eMac Fin n: Pat —> Pegs Mat); 
n:Nat;i:Fin n) : Pai 
|P; z; s; n] erin P z s (suc n) (fzn) =>zn 
[P: z;s;n;i] efm P zs (suc n) (fs, i) = s n i(efnPzsni) 





All of our eliminators will satisfy the function criteria: each has just one split, 
resulting in specialised, inaccessible patterns for the indices. As the indices may 
be arbitrary terms, this is not merely convenient but essential. Rewriting with the 
standard equational laws which accompany the eliminators of inductive families 
is necessarily confluent. 

Meanwhile, empty families have eliminators which refute their input. 


IHE tele(xo) 
D; D(2):xo; Eo(D):x1; [; z] Ep 
ep(P : D—>x;D):P D; [P;2;2] ep P 


ty) Oy 
a 8 
> > 


ae 


F valid 


We have constructed families over elements of sets, but this does not yield 
‘polymorphic’ datatypes, parametric in sets themselves. As Luo does, so we 
may also parametrise a type constructor, its data constructors and eliminators 
uniformly over a fixed initial telescope of UTT types, including xo. 


3.3 Valid Splitting Trees and Their Properties 


In this section, we deliver the promised notion of ‘valid splitting tree’ and show 
it fit for purpose. This definition is very close to Coquand’s original construction 
of ‘coverings’ from ‘elementary coverings’ [4]. Our contribution is to separate the 
empty splits (with explicit refutations) from the nonempty splits (with nonempty 
subtrees), and to maintain our explicit construction of patterns in linear con- 
structor form with inaccessible terms resulting from specialisation. 
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Definition 9 (Valid Splitting Tree) A valid splitting tree for f (A) : T has 
root problem [A] £ A. At each node, 


— either we have A’ e: |p]T and computation rule 
[A] fP e 


— or we have problem [A"; «:Dv; Alf piz] and for each constructor c(A‘) : Du, 
unification succeeds for ù and v, in which case 
e either all succeed negatively, and the node is the computation rule 


[A*; x: D v; A) £ ple] d z 
e or at least one succeeds positively, and the node is a split of form 


[A®; x: D v; Al f piz] 
zS 


Each positive success yields a pair (A',o) where o is a most general 
idempotent unifier for G and v satisfying A’ F cAS;cA* and DOM(c) W 
A’ = ASW A”, and contributes a subtree to S with root 


[A’; o[2 — c AA] £ opic A‘ 


We shall certainly need to rely on the fact that matching well typed terms 
yields type-correct substitutions. We must also keep our promise to use inacces- 
sible terms in patterns only where there is no choice. 


Definition 10 (Respectful Patterns) For a function f (A) : T, we say that 
a programming problem |A'] £ p has respectful patterns provided 


— ÆA Hjo]: A 
— ifOF @: A and MATCHES(p,@) => 0, then OF 0A’ and O[p] S a. 


Let us check that valid splitting trees maintain the invariant. 


Lemma 11 (Functions have respectful patterns) If f(A): T with compu- 
tation rules {[A;] £ Pi ri |0 <i < n} satisfies the function criteria, then [Aj] f pi 
has respectful patterns. 


Proof The root problem [A] f A : T readily satisfies these properties. We 
must show that splitting preserves them. Given a typical split as above, taking 
[47; x: DU; Alf plz] to some [A’; A] f opc A‘, let us show the latter is respectful. 
We have A®;7:D%;7A F [piaz]] : A, hence idempotence of ø yields A’; < : 
Dov; oA H[opiz]|: A. But cc AS: Dou = Dod, hence A’; AF [ople AF]: A. 
Now suppose MATCHES(ap{c A9], d) => ¢ for ®t ad: A. For some b: AS, 


=> 
> 


we must have MATCHES(p|2],@) => 0;|”% +> cb]. By assumption, the plz] are 
respectful, so ® F (0; [x — c b|) (47; x: D p; %4), hence cb: DOV = D [AS > Blt, 


= = 


and 0; |x + c bl [pix]| = a. Rearranging, we get 0; [AS + b] [pic AS] S al. 
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= 


But 0; [AS + bly unifies t and y and thus factors as 0’ -ø as ø is the most 


= 


general unifier. By idempotence of ø, 0’ and 0; [AS +> bly coincide on A’. But 


$ coincides with 0;[A° > b] on A’ because they match the same subterms of 
the g, so 0; [AS + b] = ¢-o, hence ¢fapic A‘]] = a. Moreover, we now have 
@+ (d-a)AS and @| (¢- 0)(A*;x: Dv;"A), but idempotence makes A’ a 


subcontext of (AS; A*), so fF (A’; oA) as required. 














Lemma 12 (Matching Reduction Preserves Type) Jf O | fa: A and 
f (A): T has a computation rule |A'] £ f+ e for which MATCHES(p,a@) => 9, 
then O F Oe: A. 


Proof By inversion of the typing rules, we must have [@]T < A. By respect- 
fulness, we have O H 0A’ and @ & 0[p]. By construction, A’ | e : [[p]]T, hence 
O F be: [O[p]T S [aT 3 A. 














Lemma 13 (Coverage) If a function f (A): T is given by computation rules 
{[4;] £ Pi ri : P,|0<i< n}, then for any O HË: A, it is not the case that for 
each i, MATCHES(p;, t) ft CONFLICT. 


Proof An induction on splitting trees shows that if we have root problem f p 
and MATCHES(f, t) =— 6 for well typed arguments t, matching cannot yield 
CONFLICT at all the leaf patterns. Either the root is the leaf and the result is 
trivial, or the root has a split at some xz : DW. In the latter case, we either have 0x 
not in constructor form and matching gets stuck, or 02 = cb where @(A‘) : Da, 
hence unifying @ and y must have succeeded positively yielding some ø for which 
we have a subtree whose root patterns, op{c A‘] also match t. Inductively, not 
all of this subtree’s leaf patterns yield CONFLICT. 














It may seem a little odd to present coverage as ‘not CONFLICT in all cases’, 
rather than guaranteed progress for closed terms. But our result also treats 
the case of open terms, guaranteeing that progress can only be blocked by the 
presence of non-constructor forms. 


Lemma 14 (Canonicity) For global context T, if I H t: DU, with t in normal 
form, then t is cb for some b. 


Proof Select a minimal counterexample. This is necessarily a ‘stuck function’, 
f a. By the above reasoning, we must have some internal node in f’s splitting 
tree [A?; x: D v; %4] f plz] with O[plz]| = a@ but I F 8x : D 07 a non-constructor 
form. But 0x is a proper subterm of f d, hence a smaller counterexample. 














Lemma 15 (Confluence) Jf every function defined in I satisfies the function 
criteria, then ~+p is confluent. 


Proof Function symbols and constructor symbols are disjoint. By construc- 
tion, splitting trees yield left-hand sides which match disjoint sets of terms. 
Hence there are no critical pairs. 
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4 ‘Translating Pattern Matching 


In this section, we shall give a complete translation from functions satisfying the 
function criteria and inhabiting small types to terms in a suitable extension of 
UTT, via the primitive elimination operators for inductive datatypes. We do this 
by showing how to construct terms corresponding to the splitting trees which 
give rise to the functions: we show how to represent programming problems as 
types for which splitting trees deliver inhabitants, and we explain how each step 
of problem reduction may be realised by a term. 


4.1 Heterogeneous Equality 


We must first collect the necessary equipment. The unification which we take 
for granted in splitting trees becomes explicit equational reasoning, step by step. 
We represent problems using McBride’s heterogeneous equality [16]: 


Eq(5,T:x0;8:9;t:T):x1; refl(R:xo; r: R):Eqrr rr; 
subst(R:xo;s,t:R; q:EqrR s t; P:R—>xı;p:P s):P t; 
[R; r; P; p] subst r:r:r (reflr r) Pp > p 


Eq is not a standard inductive definition: it permits the expression of hetero- 
geneous equations, but its eliminator subst gives the Leibniz property only for 
homogeneous equations. This is just a convenient repackaging of the traditional 
homogeneous identity type family |. The full construction can be found in [I5]. 

It is to enable this construction that we keep equations in xı. We shall be 
careful to form equations over data sets, but not equality sets. We are unsure 
whether it is safe to allow equality sets in x9, even though this would not yield 
an independent copy of xo in xo. At any rate, it is sufficient that we can form 
equations over data and eliminate data over equations. 

We shall write s œ t for Eqgr s t when the types S, 7 are clear. Furthermore 
Eq precisely allows us to express equations between sequences of data in the same 
telescope: the constraints which require the specialisation of datatype indices 
take exactly this form. Note we always have D tele(xo), hence if #,f: D, we may 
form the telescope of equations 1:51 © t1; -.. 3@n:8n ~ tn tele(x1) which 
we naturally abbreviate as 5 œ~ t. Correspondingly, we write refl ¢ : txt. 


4.2 Standard Equipment for Inductive Datatypes 


In , we show how to equip every datatype with some useful tools, derived from 
its eliminator, which we shall need in the constructions to come. To summarise, 


casep is just ep weakened by dropping the inductive hypotheses. 

Belowp(P : D — x; D) : x is the ‘course of values’, defined inductively by 
Giménez [7]; simulated via Ep, Belowp P = z computes an iterated tuple 
type asserting P for every value structurally smaller than z. For Nat we get 


Belowna P zero tel 
Belownat P (suc n) + Belownar P nx Pn 
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belowp(P:D > xı; p:TID. Belowp P D > P D; D):Belowp P D constructs 
the tuple, given a ‘step’ function, and is simulated via ep: 


belownat P pzero => () 
belownat P p (suc n) + (Ab: Belownat P n. (b, p n b)) (belownat P p n) 


recp(P : D — xı; p : IID. Belowp P D — P D; D): P D is the structural 
recursion operator for D, given by recp P p D+ p D (belowp P p D) 


We use casep for splitting and recp for recursion. For unification, we need: 


noConfusionp is the proof that D’s constructors are injective and disjoint— 
also a two-level construction, again by example: 


NoConfusionnat(P : x1; 2, y: Nat) :x1 
NoConfusionya P zero zero |> P —> P 
NoConfusionnya: P zero (suc y) + P 
NoConfusionna: P (suc x) zero ++ P 
NoConfusionnat P (suc z) (suc y) BH (z ~ y> P) > P 


noConfusionya(P : x1; £, y: Nat; q:x ~ y): NoConfusionya P £ y 
noConfusiony,: P zero zero (refl zero) — Ap:P.p 
noConfusionya P (suc x) (suc x) (refl (suc n)) > Ap:a ~ a—P. p (refl x) 








NoConfusionp is simulated by two appeals to Ep; noConfusionp uses 
subst once, then casep to work down the ‘diagonal’. 
noCyclep disproves any cyclic equation in D—details may be found in [I7]. 


Lemma 16 (Unification Transitions) The following (and their symmetric 
images) are derivable: 


deletion m:TA. P 
F A4; q. m A 
: MA.txt—> P 
solution m : TAP. [z+ tJMAt. P 
+ AA; q. subst T t x q (Az. TTA?; At. P) m A® A! 
: TA tearoP 
if A~ A°;:7:T; A! and A°Ft: T 
injectivity m:TIA.s~t—> P 
+ A4; q. noConfusion P (c 3) (ct) q (m A) 
: MA.cïxct> P 
conflict H AA; q. noConfusion P (chalk 3) (cheese f) q 
: TTA. chalk 3 ~ cheese t > P 


cycle FAA; q. noCycle P ... q... 
: MTA. z ~c [pjz]|—> P 

















Proof By construction. 
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4.3 Elimination with Unification 


In [16], McBride gives a general technique for deploying operators whose types 
resemble elimination rules. We shall use this technique repeatedly in our con- 
structions, hence we recapitulate the basic idea here. Extending the previous 
account, we shall be careful to ensure that the terms we construct not only have 
the types we expect but also deliver the computational behaviour required to 
simulate the pattern matching semantics. 


Definition 17 (Elimination operator) For any telescope + E tele(xo), we 
define a =-elimination operator to be any 


e: NP:ITE. «1. (141. P 51) > -> MAn. P 5,) > ME. P E 


Note that ep is a D-elimination operator; casep and recp are also. We refer 
to the £ as the targets of the operator as they indicate what is to be eliminated; 
we say P is the motive as it indicates why; the remaining arguments we call 
methods as they explain how to proceed in each case which may arise. Now let 
us show how to adapt such an operator to any specific sequence of targets. 


Definition 18 (Basic analysis) If e is a ÆZ-elimination operator (as above), 
A tele(xo) and AFT: x, then for any AF Ë: Z, the basic e-analysis of TA. T 
at t is the (clearly derivable) judgment 


mı: MA; A. £ ~ 


E> T? 3%, MAn A. Seto? 
FAA. e (A£. MA. £ ~ 


E> T) my, ... Mma tA (refl t) : MA.T 


Notice that when e is casep and the targets are some v; x where x: DU € A, 
then for each constructor c (A‘) : D t, we get a method 


me : MAS; A. xo — cA er T 


Observe that the equations on the indices are exactly those we must unify to 
allow the instantiation of x with c A‘. Moreover, if we have such an instance 
for x, i.e. if 0 unifies @ and y, and takes z + c 0A‘, then the analysis actually 
reduces to the relevant method: 


> 


casep (AD. MA. Z ~ t= T) m 0U (c OAS) OA (refl 02) (refl (c 04°)) 
~ me OAS OA (refl OV) (refl (c 04°)) 


We may now simplify the equations in the method types. 


Definition 19 (Specialisation by Unification) Given any type of the form 
TIA. @ ~ 7—T:xı, we may seek to construct an inhabitant—a specialiser —by 
exhaustively iterating the unification transitions from lemma [£6] as applicable. 
This terminates by the usual argument [10), with three possible outcomes: 
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negative success a specialiser is found, either by conflict or cycle; 

positive success a specialiser is found, given some m: MIA’. oT for o a most 
general idempotent unifier of ŭ and ọ, or 

failure at some stage, an equation is reached for which no transition applies. 


Lemma 20 (Specialiser Reduction) If specialisation by unification delivers 
m:TA’.oThs:NA@LtoT 
then for any OF OA unifying ù and Y we have s 0A (refl 0U) ~* m OA’. 


Proof By induction on the transition sequence. The deletion, solution and 
injectivity steps each preserve this property by construction. 














We can now give a construction which captures our notion of splitting. 


Lemma 21 (Splitting Construction) Suppose Al T : xı, with A tele(xo), 
A”; «:DU; Atele(xo) and A”; x:DY;, 4A F [p]z]|: A. Suppose further that for each 
c (AS): Du, unifying ŭ with Y succeeds. Then we may construct an inhabitant 
f: TA”; x:D 0 A. [p[x]||T over a context comprising, for each c with positive 
SUCCESS, 
me: MA’; oje > c ASTA. [ople ANT 
for some most general idempotent unifier A’ + o( AS; A”). In each such case, 
f cA? (c 0o45) A~+* m. A’ A 


Proof The construction is by basic casep-analysis of NAF; x : D p; "A. [pje] |T 
at v; x, then specialisation by unification for each method. The required reduction 
behaviour follows from lemma [20] 














4.4 Translating Structural Recursion 


We are very nearly ready to translate whole functions. For the sake of clarity, 
we introduce one last piece of equipment: 


Definition 22 (Computation Types) When implementing a function f(A) : 
T, we introduce the family of f-computation types as follows: 


Comp-f(A):xo9;  return-£(A; ¢:T’): Comp-f A 


call-£(Comp-f):T 
call-f A (return-f A t) Ht 


where call-f is clearly definable from ecomp-t- 


Comp-f book-keeps the connection between f’s high-level program and the 
low-level term which delivers its semantics. We translate each f-application to the 
corresponding call-f of an f-computation; the latter will compute to a return-f 
value exactly in correspondence with the pattern matching reduction. The trans- 
lation takes the following form: 
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Definition 23 (Application Translation) If f(A): T is globally defined, but 
AF f :Comp-f A for some f not containing f, the translation {-}f takes 


{eth => call-f ea) 


and proceeds structurally otherwise. Recalling that we require global functions to 
be applied with at least their declared arity, this translation removes f entirely. 


Theorem 24 If f (A) : T has a small type and computation rules [Aj] £ pj ri 
satisfying the function criteria, then there exists an f such that 


At f :Comp-f A and s~r st implies {s} ~t {tH 


Proof It suffices to ensure that the pattern matching reduction schemes are 
faithfully translated. For each 7 such that r; returns a value + e;, we shall have 


{£ [Æi = call-f [pi] [A] ~} call-f [pj] (return-f [pi] {e} ~r {e} 


Without loss of generality, let f be structurally recursive on some x:D y, jth in 
A. The basic recp-analysis of TA. Comp-f A at y; x requires a term of type 


TID. Belowp P D — TIA. D ~ @; x — Comp-f A 


where P = AD. NA. D ~ 7; x — Comp-f A. Specialisation substitutes p; x for D, 
yielding a specialiser [m]s of the required type, with 


m : MA. Belowp P& x — Comp-f A; At recp P [m]s vz A (refl v; x) 
~7 m A (belowp P [m]s vx) : Comp-f A 


by definition of recp and specialisation reduction. We shall take the latter to be 
our f, once we have suitably instantiated m. To do so, we follow f’s splitting 
tree: lemma [2I] justifies the splitting construction at each internal node and at 
each mh y leaf. Each programming problem [A’] f pin the tree corresponds to the 
task of instantiating some m’ : ITA’. Belowp P ([P]] (U; «)) — Comp-f [p] where, 

again by lemma PI] m [P] ~} m A’. 
The splitting finished, it remains to instantiate the m; corresponding to each 
[4;] £ P; — ei. Now, [A = [p;]] takes x : D y to some [p;;]: D t, so we may take 
mi =œ A\A;; H:Belowp P u [piz]. return-f [p;] el 


a 


where el is constructed by replacing each call f 7 in e; by an appropriate appeal 


to H. As f is well typed and structurally recursive, so [A > r] maps z : Dv to 
rj : DW where r; <[p;; |. By construction, Belowp P ù [pij | reduces to a tuple of 
the computations for subobjects of [p;; |. Hence we have a projection g such that 
gH : IA. Ñ; r; ~ v; x — Comp-f A and hence we take call-f 7 (g H 7 (refl w; r;)) 
to replace f r, where by construction of belowp, 
call-f 7 (g (belowp P [m]s ù [piz l) 7 (refl w; r;)) 

~h call-f F ([m]s wr; F (refl w; r;)) 

~h call-f F (m r (belowp P [m]s w r;)) 

={fr¥ 
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So, finally, we arrive at 


{f mille = = call-f [pj] (m teil (belowp P [m]s ë [pi D) 
7 call-f [p;i] (m; A; (belowp P [m]s ù [pi;])) 


| 
] 
* call-f [7; cee f [pi] [H > belowp P [ms ë [pi; Je!) 
> call-f [p;|(return-f [p] {e;}4) 

= e 











as required. 





5 Conclusions 


We have shown that dependent pattern matching can be translated into a power- 
ful though notationally minimal target language. This constitutes the first proof 
that dependent pattern matching is equivalent to type theory with inductive 
types extended with the K axiom, at the same time reducing the problem of the 
termination of pattern matching as a first-class syntax for structurally recursive 
programs and proofs to the problem of termination of UTT plus K. 

Two of the authors have extended the raw notion of pattern matching that 
we study here with additional language constructs for more concise, expressive 
programming with dependent types [I8]. One of the insights from that work is 
that the technology for explaining pattern matching and other programming lan- 
guage constructs is as important as the language constructs themselves, since the 
technology can be used to motivate and explain increasingly powerful language 
constructs. 
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Abstract. We relate Kamin and Lévy’s original presentation of lexico- 
graphic path orders (LPO), using an inductive definition, to a presenta- 
tion, which we will refer to as iterative lexicographic path orders (ILPO), 
based on Bergstra and Klop’s definition of recursive path orders by way 
of an auxiliary term rewriting sytem. 


Dedicated to Joseph Goguen, in celebration of his 65th birthday. 


1 Introduction 


In his seminal paper [I], Dershowitz introduced the recursive path order (RPO) 
method to prove termination of a first-order term rewrite system (TRS) 7. The 
method is based on lifting a well-quasi-order < on the signature of a TRS to a 
well-quasi-order <,.,, on the set of terms over the signature [2]. Termination of 
the TRS follows if l =rpo r holds for every rule 1 — r of T. 

In Bergstra and Klop an alternative definition of RPO is put forward, 
which we call the iterative path order (IPO), the name stressing the way it is 
generated—see also Bergstra, Klop and Middeldorp [4]. It is operational in the 
sense that it is itself defined by means of an (auxiliary) term rewrite system Lex, 
the rules of which depend (only) on the given well-quasi-order <. 

What has been lacking until now is an understanding of the exact relationship 
between the recursive and iterative approaches to path orders. This will be the 
main subject of our investigation here. We show that both approaches coincide 
in the case of transitive relations (orders). Moreover, we provide a direct proof of 
termination for the iterative path order starting from an arbitrary terminating 
relation on the signature, employing a proof technique due to Buchholz [5]. Both 
proofs essentially rely on a natural-number-labelled variant Lex” of the auxiliary 
TRS Lex, introduced here for the first time. 

For the sake of exposition we focus on the restriction of RPO due to Kamin 
and Lévy [6] known as the lexicographic path order (LPO)—see also Baader and 


K. Futatsugi et al. (Eds.): Goguen Festschrift, LNCS 4060, pp. 541-554] 2006. 
© Springer-Verlag Berlin Heidelberg 2006 
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Nipkow [7]. Restricting the iterative approach accordingly gives rise to what we 
call the iterative lexicographic path order (ILPO), as formulated for the first 
time in the PhD thesis of Geser [8] and, in a slightly restricted form, by Klop [9]. 
As far as we know also in this case the correspondence between both has not 
been investigated in the literature. 

The proofs that the iterative lexicographic path order is terminating and 
that LPO and ILPO coincide will constitute the body of the paper. In the con- 
clusions, we put forward some ideas on the robustness of this correspondence, 
i.e. whether variations on LPO can be matched by corresponding variations on 
ILPO restoring their coincidence. 


Acknowledgement We thank Alfons Geser for useful remarks. 


2 The Iterative Lexicographic Path Order 


The iterative lexicographic path order (ILPO) is a method to prove termination 
of a term rewrite system (TRS). Here a TRS T is terminating if its rewrite 
relation —7 is so, i.e. if it does not allow an infinite reduction tg >z ty >T 
tg >r --:. The method is based on iteratively lifting a terminating relation R 
on the signature to a terminating relation Ripo on the terms over the signature. 
The lifting is iterative in the sense that Ripo is defined via the iteration of 
reduction steps in an atomic decomposition TRS Lex (depending on R), instead 
of recursively as in Dershowitz’s recursive path order method [i]. By its definition 
via the atomic decomposition TRS, transitivity and closure under contexts and 
substitutions of Ripo are automatic, which combined with termination yields 
that Ripo is a so-called reduction order [0] Definition 6.1.2]. Therefore, for the 
TRS T to be terminating it suffices that l Ripo r holds for each rule 1— r in T. 

As a running example to illustrate ILPO, we take the terminating relation 
R given by MRA and A RS on the signature of the TRS Ded of addition and 
multiplication on natural numbers with the rewrite rules of Table [] going back 
to at least Dedekind mf 





Table 1. Dedekind’s rules for addition and multiplication 


Clearly, the relation R is terminating and ILPO will lift it to a terminating 
relation Ripo such that l Ripo r holds for each rule l — r in Ded, implying 
termination of Ded. 


5 Dedekind took 1 instead of 0 for the base case. 
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For a given relation R over the signature, the definition of Ripo proceeds 
in two steps: We first define the atomic decomposition TRS Lex depending on 
R, over the signature extended with control symbols. Next, the relation Riipo 
is defined by restricting iteration of Cex-reduction steps, i.e. of > a to terms 
over the original signature. 


Definition 1. Let R be a relation on a signature X, and let V be a signature 
of nullary symbols disjoint from X (called variables). The atomic decomposition 
TRS Lex is (X W X* W V, R}, where: 


1. The signature X* of control symbols is a copy of X, i.e. for each function 
symbol f € X, X* contains a fresh symbol f* having the same arity f has. 

2. The rules R are given in Table [Z for arbitrary function symbols f, g in 
X, with x, y, z disjoint vectors of pairwise disjoint variables of appropriate 
lengths. 





Table 2. The rules of the atomic decomposition TRS Lex 


The idea of the atomic decomposition Lex is that marking the head symbol of a 
term, by means of the put-rule, corresponds to the obligation to make that term 
smaller, whereas the other rules correspond to atomic ways in which this can be 
brought about: 


1. The select-rule expresses that selecting one of the arguments of a term makes 
it smaller. 

2. The copy-rule expresses that a term t can be made smaller by putting copies 
of terms smaller than t below a head symbol g which is less heavy than the 
head symbol f of t. 

3. The lex-rule expresses that a term t can be made smaller by making one of 
its subterms smaller. At the same time one may replace all the subterms to 
the right (whence the name lex) of this subterm by arbitrary terms that are 
smaller than the whole term t. 


For our running example Ded, the reduction A(x, 0) put A* (x, 0) —select £ in 
Lex, is a decomposition of the first rule into atomic Lex-steps. This also holds for 
the other rules of Ded. E.g. the case of the fourth rule is displayed in Figure [I] 


Remark 1. The atomic decomposition TRS Lex is not minimal; in general it 
does not yield unique atomic decompositions of rules. For instance, assuming 
for the moment that M R O would hold, the third rule M(#,0) —> 0 of Ded 
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M(x, S(y)) ae M*(x, S(y)) 

y Cory 

A(M*(x, S(y)), M* (x, S(y))) 

Lex select 
A(x, M*(x, S(y))) 
lex 
V+ y 
A(x, M(x, y)) =, A(z, M(x, S*(y))) 
select 


Fig. 1. Atomic £ex-decomposition of the fourth Dedekind rule 


could be atomically decomposed into both M(z,0) put M*(2,0) select 0 and 
M(x, 0) put M*(x, 0) copy 0; the term M* (x, 0) is copied to all (zero!) arguments 
of the symbol 0. 


Definition 2. 


1. The iterative lexicographic path order Rimo of a relation R on a signature 
X is the restriction of >} to T(X WV). 

2. A TRS is ILPO-terminating if its rules are contained in Ripo for some 
terminating relation R. 


For the TRS Ded we already saw that l hex r holds for each rule l — r. Hence 
Ded is ILPO-terminating. 

An observation that plays a crucial role in many termination methods is that 
a TRS is terminating if and only if it admits a reduction order, i.e. iff its rules 
are contained in a terminating (order) relation which is closed under contexts 
and substitutions. (See e.g. Prop. 6.1.3].) Note that transitivity and closure 
under contexts and substitutions of Rimo are ‘built in’ into its definition via the 
atomic decomposition TRS. Therefore, in order to show that Ripo is a reduction 
order, it only remains to be shown that it is terminating. This will be proved in 
Section] From that result we can then conclude that ILPO-termination implies 
termination. 

But first, in Section[3] we present some further examples of ILPO-terminating 
TRSs. 


Remark 2. Although by definition the iterative lexicographic path order Ritpo is 
transitive, even in cases when R isn’t, we do not put stress on this. In particular, 
transitivity is not used in the proof that termination lifts from R to Ripo- 


Remark 3. The iterative lexicographic path order as presented here is a strength- 
ening of the version of the iterative path order in [9] (which is there still called 
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recursive path order). The difference is that in [9] instead of the lex-rule (cf. Ta- 
ble P) the down-rule is employed: 


f* (x, 9(y), z) —~ down f(x, 9" (y), z) 


It expresses that a term may be made smaller by making one of its arguments 
smaller. The down-rule is a derived rule in our system: 


f(x, gly), z) >iex f(x, 9" (y), l) select f(x, 9" (y), z) 


where the ith select-step applies to the ith occurrence of | = f* (x, g(y), z), and 
then selects z;. The implication in the other direction does not hold as witnessed 
by the one-rule TRS f(a,b) — f(b, a) which cannot be proven terminating by 
the method presented in [9], but which is ILPO-terminating for a R b: 


f(a, b) —~ put f (a, b) —~ lex F(a", f(a, b)) —~ copy f(b, f*(a, b)) — select f(b, a) 


The simplify-left-argument-rule, introduced in the exercises in [9] in order to 
prove termination of the Ackermann TRS Ack (see below), is also easily derived 
in our system: it simply is the lex-rule with x taken to be the empty vector, i.e. 
the leftmost argument must be made smaller. It is easy to see that also that 
version is strictly weaker than ILPO-termination. 


3 Examples of ILPO-terminating TRSs 


In this section the iterative lexicographic path order method is illustrated by 
applying it to some well-known TRSs. 

The example of Dedekind’s rules for addition and multiplication only employs 
trivial applications of the lex-rule, where z is empty. Proving ILPO-termination 
of the Ackermann function requires non-trivial applications of the lex-rule. 


Example 1 (Ackermann’s function). The TRS Ack has a signature consisting 
of the nullary symbol 0, the unary symbol S, and the binary symbol Ack, with 
rules as in Table [3] 


Ack(zx, $(0)) 


Ack(x, Ack(S(z), y)) 





Table 3. Ackermann’s function 


For the relation R defined by Ack R S, the TRS Ack is ILPO-terminating as 
witnessed by the following atomic decompositions of each of its rules: 


— Ack(0,y) > put Ack*(0, y) — copy S(Ack*(0, y)) select S(y). 
-= Ack(S (a ), 0) > put Ack* (S(x J 0) ~ lex 
Ack(S* (x), Ack*(S(x),0)) select ACk(x, Ack* (S(x), 0)) copy 
Ack(a, S(Ack*(S (a ), 0))) — select Ack(x,S(0)). 
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— Ack(S(x),8(y)) put Ack*(S(z), S(y)) tex 
Ack(S* (a), Ack*(S(x), S(y))) select Ack(a, Ack* (S(x), S(y))) tex 
Ack(x, Ack(S(x),8*(y))) select Ack(x, Ack(S(), y)). 


Example 2 (Dershowitz and Jouannaud [12/). Consider the string rewrite system 
DJ given by the four rules in Table [4] 


10 — 0001 
o1—1 


11 — 0000 
00 — 0 





Table 4. String rewrite system on 0,1-words 


So we have e.g. the reduction 
1101 — 100011 — 10011 — 0001011 — 001011 — 00100000 — 0000010000 —... 


To capture this string rewrite system as a term rewrite system, the symbols 0 
and 1 are perceived as unary function symbols and the rules are read accordingly; 
e.g. 10 — 0001 is the term rewrite rule 


1(0(x)) — 0(0(0(4(x)))) 
To show ILPO-termination for DJ, we set 1 R 0 and check | Ripo r for every 
rule l — r. For the displayed rule the corresponding atomic decomposition is 
shown in Figure B] (after dropping all parentheses). 


102 — +} 1*0r 
put 


copy 
y 
01*0x 
copy 
y 
Lex 001*0x 
copy 
y 
0001*0x 


lex 








y + y 
0001x ~=— 00010*z 
select 


Fig. 2. Atomic £Lex-decomposition of the rule 10” — 0001 
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Example 3 (Primitive recursion). Let a TRS on the natural numbers be given, 
having a unary function symbol g and a ternary function symbol h. Suppose we 
adjoin a binary symbol f to the signature, with defining rules 


f(0,2) > g(2) 
f(S(z),y) > h(f (£, y), £, y) 


If the original TRS is ILPO-terminating, say for the terminating relation R on its 
signature, then the resulting system is ILPO-terminating again as can be easily 
seen after adjoining f R g and f R h to R (this yields a terminating relation 
again). 


4 ILPO-termination Implies Termination 


The remaining task is now to show that if a relation R is terminating, then Ripo 
is terminating as well. As explained in Section P] Riipo is then a reduction order, 
and it follows that ILPO-termination implies termination. 

Since the rewrite rules of Ripo are given by the restriction of >E toT(X wW 
V), termination of the atomic decomposition TRS £ex would be sufficient for 
termination of Ripo. However, Lex is in general not terminating. For instance, 
in case of the running example we have, despite R being terminating: 


A(z, y) put A* (x,y) — copy S(A*(a,y)) —~ copy S(S(A* (x, y))) copy ++- 
We even have cycles 
A* (x,y) — copy S(A*(z,y)) put S*(A* (x, y)) select A* (x, y) 


In either case, non-termination is ‘caused’ by the left-hand side of the copy-rule 
being a subterm of its right-hand side; a priori such an iteration is not bounded. 
Similar examples can be given with the lex-rule, which is also self-embedding. 

However, observe that in both of the infinite reductions the control symbol A* 
is ‘used’ infinitely often — by the copy rule. We will show that this is necessary 
in any infinite reduction. More precisely, that if for each control symbol a bound 
is given in advance on how often it can be used in the copy- and lex-rules, this 
will yield a terminating TRS Lex’. Since in any given atomic decomposition 
l >t r of a rule l— r, any control symbol is only used finitely often (certainly 
not more often than the length of the decomposition), the relations >E and 
>Ev coincide on the unmarked terms. We will exploit this fact by proving 
termination of Ripo via termination of Lex”. 


Definition 3. Let R be a relation on a signature X, and let V be a signature of 
nullary symbols disjoint from X. The TRS Lex” is (X W XY W V, R®): 


1. The signature X® of w-control symbols consists of w copies of X, i.e. for 
each symbol f € X and natural number n, ©” contains a fresh symbol f” 
having the arity f has. 
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Table 5. The rules of Lex” 


2. The rules R® are given in the table, for arbitrary symbols f, g in X and 
natural number n, with x, y, z disjoint vectors of pairwise disjoint variables 
of appropriate lengths. 


The TRS Lex (Definition [I) is seen to be a homomorphic image of Lex” by 
mapping f” to f*, for any natural number n. Vice versa, reductions in Lex can 
be ‘lifted’ to Lex”. 


Lemma 1. >t and +* 


: Lew coincide as relations restricted to T(X W V). 


Proof. 

+ 
Lex” 
already indicated above. In particular, one can translate (lift) a finite reduc- 
tion of length n in Lex to Lex” by replacing all marks (*) in the begin term 
by a natural number greater than n and, along the reduction, likewise each * 
introduced by an application of the put-rule. Numerical values for the other 
marks then follow automatically by applying the £ex’-rules that correspond 
to the original £Lex-rules. 

The result then follows, because, if t >E s and the terms t, s are in T(X 
V), i.e. do not contain marks, then the transformation of a > ex-reduction 
from t to s to Lex” leaves the begin and end terms t and s untouched. 


(2) Every > yo step is a — hex step, by the homomorphism. 


(C) One shows that >$. is included in > by formalizing the reasoning 


Œ 














Example 4. The atomic decomposition displayed in Figure [I] can be lifted to: 


B M(x, S(y)) put M(x, S(y)) — copy AM! (x, S(y)), Mt (z, S(y))) select 
A(x, M! (x, S(y))) lex A(x, M(x, S° (y))) select A(x, M(x, y)). 


The main theorem is proven by employing an ingenious (constructive) proof 
technique due to Buchholz ENG 


Lemma 2. If R is terminating, then Lex” is terminating. 


6 The technique has been discovered independently by Jouannaud and Rubio E3], 
who show that it combines well with the Tait-Girard reducibility technique (both 
are essentially based on induction on terms), leading to a powerful termination proof 
technique also applicable to higher-order term rewriting. 
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Proof. To make the notation uniform, in the sense that all function symbols will 
carry a label, we employ f“ (t) to denote the term f(t) for an unmarked symbol 
f. This allows us to write any Lex”-term uniquely as an w-marked term of the 
form f(t) for some unmarked symbol f, some ordinal œ and some vector of 
terms t. The ordinal a will be a natural number n or w. In the crucial induction 
in this proof we will make use of the fact that in the ordering of the ordinals we 
have w > n for each natural number n. 

We prove by induction on the construction of terms that any w-marked term 
is terminating. To that end it is sufficient to show that any w-marked term 
fO (t) is terminating under the assumption (the induction hypothesis) that its 
arguments t are terminating. 

So assume that t,,...,¢t, are terminating, with n the arity of f. We prove 
that f(t) is terminating by a further induction on the triple consisting of f, 
t, and a in the lexicographic product of the relations R, (ce ISM)” and >. 
Here (ce SN)” is the n-fold lexicographic product of the terminating part 
of ce, with n the arity of f. 

Clearly, the term f°(t) is terminating if all its one-step —¢.,0-reducts are|4 
The latter we prove by distinguishing cases on the type of the reduction step. 


1. If the step is a head step, we perform a further case analysis on the rule 
applied. 

(put) The result follows by the IH for the third component of the triple, 
since w > m for any natural number m. 

(select) The result follows by the termination assumption for the t. 

(copy) Then a is of the form m+ 1 for some natural number m, and the 
reduct has shape g” (f™(t),..., f’(t)) for some g such that f R g. By 
the IH for the third component, each of the f™ (t) is terminating. Hence, 
by the IH for the first component, the reduct is terminating. 

(lex) Then a is of the form m+1 for some natural number m and the reduct 
has shape f“(ti,...,ti-1,9'"(s), f(t),..., f' (6), with t; = g” (s). Each 
f™(t) is terminating by the IH for the third component. Hence, the 
reduct is terminating by the IH for the second component, since g™(s) 
is a one-step Lex”-reduct of t; (for the put-rule). 

2. If the step is a non-head step, then it rewrites some direct argument, and 
the result follows by the IH for the second component. 














Theorem 1. [LPO-termination implies termination. 


Proof. Suppose the TRS 7 is ILPO-terminating for some terminating relation 
R on its signature, i.e. | Ripo r holds for each rule | — r in T. 

Since Ripo is defined as a restriction of — hex which in turn coincides with 
Tau by Lemma [Į it is a transitive relation that is closed under contexts and 
substitutions and, by Lemma P] also terminating. Hence, Riipo is a reduction 


order, and therefore 7 must be terminating. 


= 














T This observation can be used as an inductive characterization of termination: a term 
is terminating if and only if all its one-step reducts are. In a constructive rendering 
of the proof one can take this characterization as the definition of termination. 
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Hence termination of the example TRSs such as Ded, follows from their ILPO- 
termination as established above. 


Remark 4. It is worth noting that Buchholz’s proof technique can also be applied 
to non-simplifying TRSs. For instance, for proving termination of the one-rule 
TRS f(f(x))—f(g(f(x))) the technique boils down to showing that any instance 
f(g(f(#))) of the right-hand side is terminating on the assumption that the direct 
subterm f(t) of the corresponding instance of the left-hand side is. This follows 
by ‘induction’ on the right-hand side and cases on the shape of the left-hand 
side of the rule: f(g(f(t))) can be rewritten neither at the head nor at position 
1, hence it is terminating if f(t) is. 


5 Equivalence of ILPO with the Recursive Lexicographic 
Path Order 


We show that ILPO is at least as powerful as the recursively defined lexico- 
graphic path order found in the literature, and is equivalent to it for transitive 
relations. The following definition of >j,. for a given strict order > on the sig- 
nature X, is copied verbatim from Definition 5.4.12 in the textbook by Baader 
and Nipkow [7]. 


Definition 4. Let X be a finite signature and > be a strict order on X. The 
lexicographic path order >o on T(X,V) induced by > is defined as follows: 
t >po 8 iff 


(LPO1) s€ Var(t) and t £ s, or 
(LPO2) t= f(ti,...,tm), 8 = g(51,.--, Sn), and 
(LPO2a) there exists i, 1 < i < m, with ti ipo 8, or 
(LPO2b) f >g and t >o 8; for all j, 1< j <n, or 
(LPO2c) f =g, t >o sj for j, 1 < j < n, and there exists i, 1 < i < m, 
such that tı = s1, ..., ti-1 = Si—1 and ti >Ipo Si- 


It is easy to see that this is still a correct recursive definition for > being an 
arbitrary relation R, yielding Rıpo. We calla TRS T = (X, R) LPO-terminating 
for a terminating relation R, if R C Ripo- 


Lemma 3. Ripo C Ripo, for any relation R. 


Proof. We show by induction on the definition of t Ripo s that t* —>gcex s, where 
(f(t))* = f* (t). This suffices, since t —>put t* and t, s are not marked. 


(LPO1) If s € Var(t) and t ¥ s, then the result follows by repeatedly 

selecting the subterm on a path to an occurrence of s in t. 

(LPO2) Otherwise, let t = f(ti,...,tm), S = 9(S1,---, Sn). 

(LPO2a) Suppose there exists i, 1 < i < m, with either t; = s or t; Ripo 8. 
In the former case, the result follows by a single application of the select- 
rule for index 7. In the latter case, this step is followed by an application 
of the put-rule after which the result follows by the IH. 
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(LPO2b) Suppose f R g and t Ripo s; for all j, 1 < j < n. Then the result 
follows by a single application of the copy-rule and n applications of the 


IH. 
(LPO2c) Suppose f = g, t Ripo sj for j, 1 < j < n, and there exists i, 
1 < a < m, such that ti = $1, oy tei = S§j-1 and ti Ripo Si. Then 


the result follows by a single application of the lex-rule, selecting the ith 
argument, and the IH for t; Ripo si and t Ripo sj for j,i < j < n. 














We call the Lex-strategy implicit in this proof the wave strategy. The idea is that 
the marked positions in a term represent the wave front, which moves downwards, 
i.e. from the root in the direction of the subtrees of a left-hand side, generating an 
ever growing prefix of the right-hand side behind it. This is visualised abstractly 
in Figure B] and for the atomic Lex-decomposition of M(x, S(y)) > A(x,M(x, y)) 


-1-A 


Fig. 3. Wave strategy 





of Figure [I] in Figure [4] (In fact, all Lex-reductions given above adhere to the 
wave strategy.) One can prove a converse to Lemma B] by a detailed proof- 


vue 


put select lex 


M M copy A A A 
= Fr 
T s T s M M £ M* = M 

| | fA N ZN n 
y y x g S 
SN 


y 


select A 

=> AN 

x M 
7N 
T yY 


T = 


Fig. 4. Wave strategy for atomic Lex-decomposition of M(x, S(y)) —>A(x,M(x,y)) 


theoretic analysis, showing that any £ex-reduction can be transformed, into a 
wave reduction. The upshot is that in general Rimo = (Ripo)t. As a corollary 
we then have that ILPO is equivalent with LPO for any strict order, and that 
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Riipo is decidable, in case R is a terminating relation for which reachability is 
decidable: simply ‘try all waves up to the size of the right-hand side’ P| 


Remark 5. Note that (R*) jmp. may differ from Ripo. Consider the signature 
consisting of nullary symbols a, b, and unary symbols f, g, and the terminating 
relation f Rb R g. Then the one-rule TRS f(a) — g(a) is not ILPO-terminating, 
but it is ILPO-terminating for R+. The problem is that making f(a) smaller 
using R forces erasure of its argument a, because b is nullary. 


A proof-theoretic analysis of the wave strategy is beyond the scope of this paper. 
Here we will be satisfied by giving a rather ad hoc proof of the converse of 
Lemma [3] for the case where we start with a transitive relation R. 


Lemma 4. Ripo C Ripo, for any transitive relation R. 


Proof. Fix the relation R. By definition, if t Ripo s then t >E s. By Lemma [i] 
then also t > aw s. To show that this implies t Ripo s, we employ a homomor- 
phic embedding € of w-marked terms (as introduced in the proof of Lemma [2) 
defined by f®“(u) +> f(e(u),a). Here the terms in the range of € are terms over 
the signature obtained from X by increasing the arity of every function symbol 
by 1, and adjoining the ordinals up to w as nullary symbols. The idea of the 
embedding is that every function symbol gets an extra final argument signifying 
how many times the symbol may be ‘used’. Initially (unmarked) it is set to w. 
Embedding the TRS Lex” (see Table [5) yields the TRS ¢(Lex”) having rules 
given in Table [6] 


put f(x, n) 
—select Vi (1 < i < |x]) 


f(x, n4 — copy g( f(x, n),...,f(@,n),w (f Rg) 





) 
f(x, gly, w), z, n+ ~ lex f(x, gly, n), l w) l= f(x, g(y,w),2,n)) 





Table 6. Rules of e(Lex”) 


+ 


Lex’ 
each of the e(Lex’)-rules is contained in e(f),,,,, where e(R) is obtained by taking 
the union of R and the natural greater than relation > on the ordinal symbols, 
and relating every function symbol to any ordinal symbol. Note that e(R) is 
transitive since R and > are. Since the relation e(),,, is closed under contexts 
and substitutions and is transitive if e(R) is (see e.g. morm) | we conclude that 


By definition of e€, t > s implies e(t) oo e(s). It is easy to verify that 


8 This is somehow analogous to the way in which rippling guides the search for a proof 
of a goal from a given [14]. 

9 Note that these properties need to be verified separately for the recursive lexico- 
graphic path order (or any variation on it). In case of the iterative lexicographic 
path order (or any variation on it), these properties hold automatically by it being 
given by a TRS, allowing one to focus on establishing termination. 
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e(t) e(R)po €(s). That this implies t Ripo s$, follows by an easy induction on 
the definition of the former. The crux of the proof is that the adjoined ordinal 
symbols do not relate to the original symbols from X. The only problematic 
cases (since then the IH could not be applied) are: 


(LPO2a) holds since either w = e(s) or w e(R)po €(s). Neither can be the 
case since w is not related to symbols in X, in particular not to the head 
symbol of e(s) (which is the same as the head of s). 


(LPO2c) holds since w e(R)po w. This obviously cannot be the case. 


In the other cases the IH does the job, e.g. 
(LPO2a) holds since either e(t;) = e(s), or e(ti) e(R)po €(s), for some i. 


Then either t; = s by injectivity of €, or ti E(R)ing s by the IH, and we 
conlude t Ripo s. 














Combining Lemmas [3] and [4] yields our second main result. 


Theorem 2. >iipo = >Ipo, for any transitive relation (order) >. 


6 Conclusion 


We have shown that our iterative set-up of ILPO can serve as an independent 
alternative to the classical recursive treatment of LPO. It can be seen as being 
obtained by decomposing the recursive definition, extracting atomic rules from 
the inductive clauses. From this perspective it is only natural that we have 
taken an arbitrary terminating relation (instead of order) on the signature as 
our starting point, so one could speak, in the spirit of Persson’s presentation of 
recursive path relations [I5], of iterative lexicographic path relations. 

We claim that the correspondence between recursive and iterative ways of 
specifying path orders is robust, i.e. goes through for variants of LPO like the 
embedding relation and recursive path orders. Substantiating the claim is left to 
future research. 

Another direction for further investigation is suggested by Remark Ø] It seems 
that an analogous argument can be used to yield soundness of Arts and Giesl’s 
dependency-pair technique for proving termination. (See e.g. [IO] Section 6.5.5].) 
Thus, whereas non-simplifying TRSs are traditionally out of the scope of the 
recursive path order method, by their termination proof being tied to Kruskal’s 
Tree Theorem, Buchholz’s technique will give us a handle on a uniform treatment 
of both path orders and the dependency-pair technique. 
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Abstract. The semantic constructions and results for definite programs do not 
extend when dealing with negation. The main problem is related to a well-known 
problem in the area of algebraic specification: if we fix a constraint domain as 
a given model, its free extension by means of a set of Horn clauses defining a 
set of new predicates is semicomputable. However, if the language of the exten- 
sion is richer than Horn clauses its free extension (if it exists) is not necessarily 
semicomputable. In this paper we present a framework that allows us to deal with 
these problems in a novel way. This framework is based on two main ideas: a 
reformulation of the notion of constraint domain and a functorial presentation 
of our semantics. In particular, the semantics of a logic program P is defined in 
terms of three functors: (OP p, ALG p, LOG p) that apply to constraint domains 
and provide the operational, the least fixpoint and the logical semantics of P, re- 
spectively. The idea is that the application of O? p to a specific constraint solver, 
provides the operational semantics of P that uses this solver; the application of 
ALG p to a specific domain, provides the least fixpoint of P over this domain; 
and the application of £ 0G p to a theory of constraints provides the logic theory 
associated to P. We prove that these three functors are in some sense equivalent. 


1 Introduction 


Constraint logic programming was introduced in ({9]) as a powerful and conceptually 
simple extension of logic programming. Following that seminal paper, the semantics 
of definite (constraint) logic programs has been studied in detail (see, e.g. [10], (11). 
However, the constructions and results for definite programs do not extend when deal- 
ing with negation. The main problem is related to a well-known problem in the area of 
algebraic specification: if we fix a constraint domain as a given model, its free extension 
by means of a set of Horn clauses defining a set of new predicates is semicomputable. 
However, if the language of the extension is richer than Horn clauses its free extension 
(if it exists) is not necessarily semicomputable ([8]). Now, when working without nega- 
tion we are in the former case, but when working with negation we are in the latter case. 
In particular, this implies that the results about the soundness and completeness of the 
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operational semantics with respect to the logical and algebraic semantics of a definite 
constraint logic program do not extend to the case of programs with negation, except 
when we impose some restrictions to these programs. 


The only approach that we know dealing with this problem is ({[19]). In that pa- 
per, Stuckey presents one of the first operational semantics which is proven complete 
for programs that include (constructive) negation. Although we use a different opera- 
tional semantics, that paper has had an important influence in our work on negation. 
The results in ({19]) were very important when applied to the case of standard (non- 
constrained) logic programs because they provided some good insights about construc- 
tive negation. However, the general version (i.e., logic programs over an arbitrary con- 
straint domain) is not so interesting (in our opinion). The reason is that the completeness 
results are obtained only for programs over admissible constraints. We think that this 
restriction on the constraints that can be used in a program is not properly justified. 


In our opinion, the problem when dealing with negation is not on the class of con- 
straints considered, but rather, in the notion of constraint domain used. In particular, 
the notion of constraint domain used in the context of definite programs is not adequate 
when dealing with negation. Instead, we propose a small reformulation of the notion of 
constraint domain. To be precise, we propose that a domain should be defined in terms 
of a class of elementarily equivalent models and not in terms of a single model. With 
this variation we show the equivalence of the logical, operational, and fixpoint seman- 
tics of programs with negation without needing to restrict the class of constraints. 


The logical semantics that we have used is the standard Clark-Kunen 3-valued com- 
pletion of programs (see, e.g. [19]). The fixpoint semantics that we are using is a vari- 
ation of other well-known fixpoint semantics used to deal with negation ((5{19|6[15]). 
Finally, the operational semantics that we are using is an extension of a semantics called 
BCN that we have defined in ({16]) for the case of programs without constraints. The 
main reason for using this semantics and not Stuckey’s semantics is that our seman- 
tics is simpler. This implies having simpler proofs for our results. In particular, we do 
not claim that our semantics is better than Stuckey’s (nor that it is worse). A proper 
comparison of these two semantics and of others like would need experimental 
work. We have a prototype implementation of BCN ([{I]]), but we do not know if the 
other approaches have been implemented. Anyhow, the pragmatic virtues of the various 
operational approaches to constructive negation are not a relevant issue in this paper. 


Our semantics is functorial. We consider that a constraint logic program is a pro- 
gram that is parameterized by the given constraint domain, i.e., that the semantics of a 
program should be some kind of mapping. However, we also think that working in a 
categorical setting provides some additional advantages that are shown in the paper. 


The paper is organized as follows. In the following section we give a short intro- 
duction to the semantics of (definite) constraint logic programs. In Section three, we 
discuss the inadequacy of the standard notion of constraint domain when dealing with 
negation and propose a new one. In Section four we study the semantics of programs 
when defined over a given arbitrary constraint domain. Then, in the following section 
we define several categories for defining the various semantic domains involved and 
define the functorial semantics of logic programs. Finally, in Section 6 we prove the 
equivalence of the logical, fixpoint and operational semantics. 
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2 Preliminaries 


A signature È consists of a pair of sets (F Sy, PSy) of function and predicates symbols, 
respectively, with some associated arity. T;(X) denotes the set of all first-order X-terms 
over variables from X, and Ts denotes the set of all ground terms. A literal is either an 
atom p(t1,...,fn) (namely a positive literal) or a negated atom —p(t,...,t,) (namely a 
negative literal). The set Forms is formed by all first-order X-formulas written (from 
atoms) using connectives —,/,V,—,<> and quantifiers V,4. We denote by free(@) the 
set of all free variables occurring in Q. @(X) specifies that free(@) C X. Senty is the set 
of all @ € Formy such that free(@) = 0, called Z-sentences. By oœ? (resp. 97 ~*) we 
denote the formula Vx; . . . Yxn (Ọ) (resp. xy... xn(@)), where x1 ...x, are the variables 
in free(@) ~Ë Z. In particular, the universal (resp. existential) closure, that is @”~® (resp. 
~~ ~®) is denoted by @” (resp. 97). 

The semantics of normal logic programs is defined using a concrete three-valued 
extension of the classical two-valued interpretation of logical symbols. The connectives 
~, A, V and quantifiers (VY, 3) are interpreted as in Kleene’s logic ((12]). However, < is 
interpreted as the identity of truth-values (hence, «> is two-valued) Moreover, to make 
© e y logically equivalent to (@ — yw) A (y — @), Przymusinski’s interpretation ({17]}) 
of — is required. It is also two-valued and gives the value £f exactly in the following 
three cases: t — f, t — u and u — f. Equality is two-valued also. Following [B], it is 
easy to see that the above three-valued logic satisfies (as classical first-order logic does) 
all of the basic metalogical properties, in particular completeness and compactness. 

A three-valued 2-structure, .4, consists of a universe of values A, and an interpreta- 
tion of each function symbol by a total function (of adequate arity), and of each pred- 
icate symbol by a total function on the set of the three boolean values {t,f,u} (i.e., 
a partial relation). Hence, terms cannot be undefined, but atoms can be interpreted as 
u. Mods denotes the set of all three-valued 2-structures. A X-structure 4 € Mods is a 
model of (or satisfies) a set of sentences ® if, and only if, 4(@) = t for any sentence 
@ € ®. This is also denoted by 4 = ®. We will denote by 4 =o ® that A satisfies 
the sentence 6(®), resulting from the valuation o : free(®) — A of the formula ®. 
Given a set ® of X-sentences Mods(®) is the subclass of Mods formed by the models 
of ®. Logical consequence ® |= ọ means that 4 = @ holds for all 4 € Mods(®). Two 
i-structures A and 8 are elementarily equivalent, denoted A ~ 8 if 4(@) = 2 (Ọ) for 
each first-order Z-sentence @. We denote by EQ(A) the set of all X-structures that are 
elementarily equivalent to 4. 

A 2-theory is a set of X-sentences closed under logical consequence. A theory can 
be presented semantically or axiomatically. A semantic presentation is a class C of X- 
structures. Then, the theory semantically presented by C is the set of all &-sentences 
which are satisfied by C: 














eS 




















Th(C) = {9 € Senty| forall a € C A(g)=t} 


An axiomatic presentation is a decidable set of axioms Ax C Sent. Then, the theory 
axiomatically presented by Ax is the set of all logical consequences of Ax: 


Th(Ax) = {@ € Sents | Ax H| o} 





A 2-theory 7T is said to be complete if @ E€ T or = € T holds for each X-sentence @. 
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2.1 Constraint Domains 


A constraint logic program can be seen as a program where some function and 
predicate symbols have a predefined meaning on a given domain, called the con- 
straint domain. In particular, according to the standard approach for defining the 
class of CLP(X) programs (E0), (11), a constraint domain x consists of five parts 
(Xx ,L£x,Axx,Dx,solvy), where Ey = (FSx,PS,x ) is the constraint signature, i.e., the 
set of symbols that are considered to be predefined; £x is the constraint language, i.e., 
the class of Xy -formulas that can be used in programs; D y is the domain of computa- 
tion, i.e., a model defining the semantics of the symbols in Èy ; Axy is an axiomatization 
of the domain, i.e., a decidable set of £x -sentences such that Dy = Axx; and, finally, 
solvx is a constraint solver, i.e., an oracle that answers queries about constraints and 
that is used for defining the operational semantics of programs. In general, constraint 
solvers are expected to solve constraints, i.e., given a constraint c, one would expect 
that the solver will provide the values that satisfy the constraint or that it returns an 
equivalent constraint in solved form. However, in our case, we just need the solver to 
answer (un)satisfiability queries. We consider that, given a constraint c, solv, (c) may 
return F, meaning that c is not satisfiable or it may answer T, meaning that c is valid in 
the constraint domain, i.e., that =c is unsatifiable. The solver may also answer U mean- 
ing that either the solver does not know the right answer or that the constraint is neither 
valid nor unsatifiable. 
In addition, a constraint domain x must satisfy: 





— T,F,ti =f € Ly (hence the equality symbol = belongs to PS, ) and £y is closed 
under variable renaming, existential quantification and conjunction. Moreover, the 
equality symbol = is interpreted as the equality in D y , and Axx includes the equal- 
ity axioms for =. 

— The solver does not take variable names into account, that is, for all renamings p, 
solvx (c) = solvx (p(c)) 

— Axy,D,x and solv, agree in the sense that: 

1. Dy is a model of Axx. 
2. For all c € Ly MSents,: solvy(c) = T > Axx Fe. 
3. For all c € £y N Sents, : solvy (c) = F => Axx E ~c. 





Moreover, solvy must be well-behaved, i.e., for any constraints cı and c2: 





1. solvy (c1) = solvx (c2) if = c1 > c2. 


2. If solv, (c1) =F and = cı — ct) then solv, (c2) =F. 





In what follows, a constraint domain X = (Èx , £x ,AXx,D x ,solvy ) will be called 
a (Èx , £x )-constraint domain. 


2.2 Constraint Logic Programs 


A constraint logic program over a (Xx , £x )-constraint domain x can be seen as a gen- 
eralization of a definite logic program. In particular, a constraint logic program consists 
of rules p : — q1,.--,QnOC1,.--;Cm, Where each q; is an atom and each c; is a constraint in 
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Lx, and where atoms have the form q(f1,...,t,), where q is a user-defined predicate and 
ti,...,f, are terms over Èx. A program rule can be written, equivalently, in flat form 














P(X1,---,Xn) 2 — tye QnOC1, Cm, X1 = t1,- -Xn = tn 


where X,,...,X, are fresh new variables. In what follows we will assume that constraint 
logic programs consist only of flat rules. 

The semantics of a (2x, £x )-logic program P can be also seen as a generalization 
of the semantics of a (non-constrained) logic program. In particular, in [0T], the 
meaning of P is given in terms of the usual three kinds of semantics. 

The operational semantics is defined in terms of finite or infinite derivations Sı ~> 
S2~...~ Sn... where the states S; in these derivations are tuples G;oC;, where G; is a 
goal (i.e., a sequence of atoms) and C; is a sequence of constraints (actually a constraint, 
since constraints are closed under conjunction). In particular, from a state $ = GaC we 
can derive the state S’ = G'oC' if there is a rule p(X1,...,Xn) : — GooCp, and an atom 
p(ti,.--,tn) in G, where X),...,X,, are fresh new variables not occurring in GoC, such 
that G’ =< Go, (G\p(th,.--,tn)) > and C! =< C,Co,X1 = t1,- - -Xn = tn > is satisfiable. 
Then, given a derivation Sı ~> Sz ~~ ... ~> Sn, with Sn = GnoCn, we say that Cn is an 
answer to the query S$; = G,OC, if G, is the empty goal. 

The logical semantics of P is defined as the theory presented by PU Axy. 

Finally its algebraic semantics, M(P,Xx ), is defined as the least model of P extend- 
ing Dx, in the sense that this model agrees with Dy in the corresponding universe of 
values and in the interpretation of the symbols in 2. It may be noted that 2-structures 
extending Dx can be seen as subsets of Basep(Dx ), where Basep(Dx ) is the set of all 
atoms of the form p(01,...,0,,), where p is a user-defined predicate and 0,...,Q, are 
values in Dy. As in the standard case, the algebraic semantics of P can be defined as 
the least fixpoint of the immediate consequence operator T : 22aser(Px) —, 2Baser(Px) 
defined as follows: 



















































































Tp (1) = {0(p) | © : free(p) > Dx is a valuation, (p :— aoc) € P, I Fg and Dy o c} 











In [11] it is proved that the above three semantics are equivalent in the sense that: 


— The operational semantics is sound with respect to the logical semantics. That is, if 
a goal G has answer c then PUAx, = c — G. 

— The operational semantics is also sound with respect to the algebraic semantics. 
That is, if a goal G has answer c then M(P,x) H} c > G. 

— The operational semantics is complete with respect to the logical semantics. That is, 
if PUAx, = c— G, then G has answers c1,...,C, such that Axy Fcc V...VCp. 

— The operational semantics is complete with respect to the algebraic semantics. That 
is, if M(P,X ) Eo G, where o: free(G) — Dx is a valuation, then G has an answer 
c such that Dy Foc 




















2.3 A Functorial Semantics for Constraint Logic Programs 


The semantic definitions sketched in the previous subsection are, in our opinion, not 
fully satisfactory. On one hand, a constraint logic program can be seen as a logic pro- 
gram parameterized by the constraint domain. Then, we think that its semantics should 
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also be parameterized by the domain. This is not explicit in the semantics sketched 
above. On the other hand, we think that the formulation of some of the previous equiva- 
lence results could be found to be, in some sense, not fully satisfactory. Let us consider, 
for instance, the last result, i.e., the completeness of the operational semantics with re- 
spect to the algebraic semantics. In our opinion, a fully satisfactory result would have 
said something like: 


if M(P,X) Ho G where o: free(G) > Dx is a valuation, then G has an answer c such 
that solvy (c) #F 





However this property will not hold unless the constraint solver solv, is also com- 
plete with respect to the computation domain. 

In our opinion, each of the three semantics (logical, algebraic and operational se- 
mantics) of a constraint logic program should be some kind of mapping. Moreover, we 
can envision that the parameters of the logical definitions would be constraint theories. 
Similarly, the parameters for algebraic definitions would be computation domains. Fi- 
nally, the parameters for the operational definitions would be constraint solvers. In this 
context, proving the soundness and completeness of one semantics with respect to an- 
other one would mean comparing the corresponding mappings. In particular, a given 
semantics would be sound and complete with respect to another one if the two semantic 
mappings are in some sense equivalent. On the other hand, we believe that these map- 
pings are better studied if the given domains and codomains are not just sets or classes 
but categories, which means taking care of their underlying structure. As a consequence, 
these mappings would be defined as functors and not just as plain set-theoretic func- 
tions. 

In Section [5] the above ideas are fully developed for the case of constraint normal 
logic programs. The case of constraint logic programs can be seen as a particular case. 


3 Domain Constraints for Constraint Normal Logic Programs 


In this section, we provide a notion of constraint domain for constraint normal logic 
programming. The idea, as discussed in the introduction, is that this notion, together 
with a proper adaptation of the semantic constructions used for (unconstrained) nor- 
mal logic programs, will provide an adequate semantic definition for constraint normal 
logic programs. In particular, the idea is that the logical semantics of a program should 
be given in terms of the (3-valued) Clark-Kunen completion of the program, the op- 
erational semantics in terms of some form of constructive negation [19[5]6], and the 
algebraic semantics in terms of some form of fixpoint construction (as in [19J6[15}). 
The main problem is that a straightforward extension (as it may be just the inclu- 
sion of negated atoms in the constraint languages) of the notion of constraint domain 
introduced in Subsection[2. 1] will not work, as the following example shows. 


Example 1 Let P be the CLP(N ) program: 

















g(z):-—az=0 
qv): —q(xjav=x41 
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and assume that its logical semantics is given by its completion: 





V2(q(z) = (z= 0VAx(q(x) Av =x+1))). 


This means, obviously, that q(n) should hold for each n. Actually, the model defined 
by the algebraic semantics seen in Subsection{2.1| would satisfy Vzq(z). Now consider 
that P is extended by the following definitions: 


r: — q(x) 
si-ar 


whose completion is: 





(r = Ax(-q(x))) A (s > =r). 


Now, the operational semantics, and also the œ-iteration of the Fitting’s operator 
[Z], would correspond to a three-valued structure extending IN, where both r and s are 
undefined and where, as before, q(n) holds for each n. Unfortunately, such a structure 
would not be a model of the completion of the program since this structure satisfies 
Vzq(z) but it does not satisfy either =r or s. E 


The problem with the example above is that, if the algebraic semantics is defined by 
means of the œ-iteration of an immediate consequence operator, then, in many cases, the 
resulting structure would not be a model of the completion of the program. Otherwise, 
if we define the algebraic semantics in terms of some least (with respect to some order 
relation) model of the completion extending IN, then, in many cases, the operational 
semantics would not be complete with respect to that model. Actually, in some cases 
this model could be non (semi-)computable ((2], [8]}). 

In our opinion, the problem is related to the following observation. Let us suppose 
that x = (2, £,Ax,D,solv) and x' = (2,£,Ax,D’,solv) are two constraint domains 
that only differ in their domains of computation, D and D’, which are elementarily 
equivalent. Now, a program defined over any of these domains would show exactly 
the same behaviour, since both algebras satisfy exactly the same constraints, i.e., we 
may consider that two structures that are elementarily equivalent should be considered 
indistinguishable as domains of computation for a given constraint domain. As a con- 
sequence, we may consider that the semantics of a program over two indistinguishable 
constraint domains should also be indistinguishable. However, if P is a (£, £ )-program, 
then M(P,x) and M(P,x’) are not necessarily elementarily equivalent. In particular if 
we consider the program P of Example[I]and we consider as constraint domain a non- 
standard model of the natural numbers IN’, then we would have that M(P, IN) — Vzq(z) 
but M(P, IN’) 4 Vzq(z). 

We think that this problem is caused by considering that the domain of computa- 
tion, Dy, of a constraint domain is a single structure. In the case of programs without 
negation this apparently works fine and it seems quite reasonable from an intuitive 
point of view. For instance, if we are writing programs over the natural numbers, it 
seems reasonable to think that the computation domain is the algebra of natural num- 
bers. However, when dealing with negation, we think that the computation domain of 
a constraint domain should be defined in terms of the class of all the structures which 
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are elementarily equivalent to a given one. To be precise, we reformulate the notion of 
constraint domain as follows: 


Definition 2 A (Èx , £y )-constraint domain X is a 5-tuple (Xx, Lx ,AXx ,Domyx ,solvx ), 
where Xx = (FS ,PSx) is the constraint signature, Ly is the constraint language, 
Dom, = EQ(Dx) is the domain of computation, i.e., the class of all Xx -structures 
which are elementarily equivalent to a given structure Dx, Axx is a decidable set of 
È -sentences such that Dy = Axx, and solvy is a constraint solver, such that: 





— T,F,t) =b E Ly (hence the equality symbol = belongs to PS, ) and Lx is closed un- 
der variable renaming, existential quantification, conjunction and negation. More- 
over, the equality symbol = is interpreted as the equality in Domy and Axx includes 
the equality axioms for =. 

— The solver does not take variable names into account, that is, for all variable re- 
namings p, solvx (c) = solvx (p(c)) 

— Ax, ,Dom, and solvy agree in the sense that: 

I. Dy is a model of Axx. 
2. For allc € Ly N Sentz: solvy (c) = T => Axx Ec. 
3. For allc € Ly OSents: solvy (c) =F > Axy H ~c. 





As before, we assume that solv, is well-behaved, i.e., for any constraints cı and c2: 





1. solvy (ci) = solvy (c2) if = c1 © c2. 


2. If solvx (ci) =F and = cı — ge then solvx (c2) =F. 





4 Semantic Constructions for Constraint Normal Logic Programs 


Analogously to constraint logic programs, given a signature £ = (PSy,F Sy), normal 
constraint logic X-programs over a constraint domain X = (Èx , £x ,Axx ,Domx , solv, ), 
can be seen as a generalization of a normal logic programs. So, a X-program now con- 
sists of clauses of the form a:— @),...,£m0C1,--.,Cn, Where a and the 4;, i € {1,...,m}, 
are a flat atom and flat literals, respectively, whose predicate symbols belong to PSx \ 
PS, and the cj, j € {1,...,} belong to £x. As before, we also assume that all clauses 
defining the same predicate p have exactly the same head p(X1,...,Xm). 














4.1 Logical Semantics 
The standard logical meaning of a 2-program P is its (generalized) Clark’s completion 
Comp, (P) = Axx UP*, where P* includes a sentence 

Vz(q(Z) = ((GiA c) v < V (Gk A cx)? ~*)) 


for each q € PSs \ PSx, and where {(q(Z) : — Gioc1),...,(q(Z) : — Gracg)} is the set 
of all the clauses in P with head predicate q. In what follows, this set will be denoted 
by Defp(q). Intuitively, in this semantics we are considering that De fp(q) is a complete 


























3 If the set is empty, then the above sentence is simplified to V2z(q(Z) OF 
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definition of the predicate q. A weaker logical meaning for the program P is obtained 
by defining its semantics as Axx UP”, where PY, is the set including a sentence 


YZ) — (Gi Aci)? 2 Vv... V (Ge Ac) 
for each q € PSy \ PS. 


4.2 The BCN Operational Semantics 


In this section we generalize the BCN operational semantics introduced in and 
refined in in such a way that it can be used for any constraint domain. The BCN 
operational semantics is based on two operators originally introduced by Shepherdson 
to characterize Clark-Kunen’s semantics in terms of satisfaction of (equality) con- 
straints. Such operators exploit the definition of literals in the completion of programs 
and associate a constraint formula to each query. As a consequence, the answers are 
computed, on one hand, by a symbolic manipulation process that obtains the associated 
constraint(s) of the given query and, on the other hand, by a constraint checking process 
that deals with such constraint(s). In particular, the original version ({16]) of the BCN 
operational semantics works with programs restricted to the constraint domain of terms 
with equality. 


Definition 3 For any program P, the operators TP and FP associate a constraint to 
each query, as follows: 





Let De fp(q)={q(2) : — inc; | 1 < i < m} 














T3(q@)=F Taa) =V av (ci AT (4)) 
FO (q@))=F Fala) = AE V¥ (mei V FEE) 





Forallk € IN: 
TE (T)=T FP (T) =F 
TE (-4(2)) = Fe (4@)) Fe (-4@)) = Tf (4@) 


TE (Ae!) =Nea TH (G) FRN) = Vien FE (Cd) 
For any c € Ly, for any k E IN: 


TP(c)=c FP (c) = =c 


Definition 4 Let P be a program and solvx a constraint solver. A BCN(P,solvx )- 
derivation step is obtained by applying the following derivation rule: 


(R) lı, hod is BCN (P, solv, )-derived from ©), (X), £200 if there exists k > 0 such that 
d = TP (€(X)) Ac and solvy (d?) AF. 
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Definition 5 Let P be a program and solv x a constraint solver. 


1. A BCN(P, solv, )-derivation from the query L is a succession of BCN(P,solvx )- 
derivation steps of the form L ~~ eso.) +++ ~ Psov) L'. Then, L> pow L means 
that the query L' is BCN(P,solvy )-derived from the query L in n BCN(P, solv, )- 
derivation steps. 

2. A finite BCN(P,solvy )-derivation L> psow \L’ is a successful derivation if L’ = oc. 
In this case, c*\f"¢) is the corresponding BCN(P, solvx )-computed answer. 

3. A query L = lac is a BCN(P, solv, )-failed query if solvx((c > FP (2))") = T for 
some k > 0 such that solv, (FP (0)") # F. 


























A selection rule is a function selecting a literal in a query and, whenever Solvx is 
well-behaved, BCN (P, solvx ) is independent of the selection rule used. To prove this 
assertion we follow the strategy used in , SO we first prove the next lemma. 


Lemma 6 (Switching Lemma) Let P be a program and Solvx be a well-behaved solver. 
Let L be a query, £1, £2 be literals in L and let L ~p sonv,) L1 esov) L’ be a non-failed 
derivation in which €, has been selected in L and ¢ in Lı. Then there is a derivation 
L psow) L2 wsoy) L” in which b has been selected in L and 4 in Ly, and L’ and L” 
are identical up to reordering of their constraint component. 


Theorem 7 (Independence of the selection rule) Let P be a program and solvx a 
well-behaved solver. Let L be a query and suppose that there exists a successful 
BCN(P, solv, )-derivation from L with computed answer c. Then, using any selection 
rule R there exists another successful BCN(P, solv x )-derivation from L of the same 
length with an answer which is a reordering of c. 


Next, we establish the basis for relating the BCN (P, solv, ) operational semantics to 
the logical semantics of a particular class of constraint logic programs. The proposition 
below provides the basis for proving soundness and completeness of the semantics. 


Proposition 8 Let = = (F Sx ,PSx UPS) be an extension of a given signature of con- 
straints Ly» = (FS; ,PSx ) by a set of predicates PS, and let P be a X-program. For each 
Èx -theory of constraints Axx, each conjunction of -literals 0 and each k in IN: 


P*UTh(Axx) = (TEO > DY 





4.3 Fixpoint Semantics 


According to what is argued in Section B] we consider the domain (Domy/=,~) for 
computing immediate consequences defined as follows: Let Domy be the class of three- 
valued 2-interpretations which are extensions of models in Dom, . Then, as it is done in 
to extend to the general constraint case, we consider the Fitting’s ordering on 
Domzy interpreted in the following sense: For all partial interpretations 4,8 € Domy, 


for each £y -constraint c(X) and each 2-literal £() 





AXB iff A((c>O")=t 
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It is quite easy to see that (Doms, <) is a preorder. Therefore, we consider the equiv- 
alence relation = induced by < (4 = 8 if, and only if, A < 8 and B < A), and the 
induced partial order 


|a], [8] € Doms /=: [A] < [B] iff a <8 


to build a cpo (Domz/=, <) with a bottom class [Ls] such that for each 4 € [ly] we 
have that 4((c — €)”) #t for all £y -constraint c(X) and all L-literal ¢(%). That is, the 
set of goals of the form (c — £)” satisfied by the models in [Ly] is empty. 


Proposition 9 (Domy/=,~<) is a cpo with respect to <, and, the equivalence class | Lyx] 
is its bottom element. 


Definition 10 (Immediate consequence operator 7," ) Let P be a Z-program, then 
the immediate consequence operator T atx : Domz/= — Domy/= is defined for each 
|a] € Domy/=, as 

Tp" (a) = [@p* (a)] 
where Dx is the distinguished domain model in the class Domx, A is any model in |a], 


and [®,* (A)] is the =-class of models such that for each £x -constraint c(X) and each 
x-atom p(X), 


(i) Bp* (a)((e > p)’) = t if, and only if, there are (renamed versions of) clauses 
{p(X) :— &),-.-,0,,0d:|1 < i < m} C Defp(p) and Dx -satisfiable constraints 
{cj |1<is<m A 1< j< ni} such that 
e a(i hE 
© Dx ((C > Vi<icm Dil(A1<j<n; cj Ad) y=e 
(ii) @p* (a)((c > =p”) = t if and only if, for each (renamed version) clause in 
{p(F) :— 4o ln di | 1 < i < m} = Defp(p(¥)) there is a Ji C {1,...ni} and 
D x -satisfiable constraints {c| 1 <i <m A j € Ji} such that 
© A((ch =ni) 
© Dx ((¢ > Ai <iem VV jes; ci Vad) y= 
































where, for eachi € {1,...,m}, Y; are the free variables in {€,,...,),,,di} not in X. 


Remark 11 7n the definition of the operator ®}* , we could choose any other model in 
Dom ,, instead of Dx, since all of them are elementarily equivalent, and the domain is 
just used for constraint satisfaction checking. Similarly, A could be any other model in 
[4] since it is used for checking satisfaction of sentences of the form (c — ¢)". Moreover, 
models in a =-class [Bp* (A)| are elementarily equivalent in its restrictions to Xx. In 
fact, [Bp* (A)]|z, = Dom, since, all classes in Domy are (conservative) predicative 
extensions of Dom x and, the operator T, oe does not compute new consequences from 
Ey, 
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In what follows we will prove that 7;””"* is continuous in the cpo Domy/=. As a 


consequence, it has an effectively computable least fixpoint. 


Theorem 12 os is continuous in the cpo (Domy/=,~), so it has a least fixpoint 
Teo" to =L[@P* tn. 


However, it is important to notice that, as we will show in example[13] 
Lll@p* Ta] # [®p* To] 


In fact, the operator Dp“ can be considered a variant of the Stuckey’s immediate con- 


sequence operator in [19], so, it inherits its drawbacks. On one hand, Dp“ is monotonic 
but not continuous. On the other hand, it will have different behavior depending on the 
constraint domain in Domy that may be predicatively extended. 


Example 13 Consider the CNLP(N. )-program from example} 


























g(z):-—az=0 
qv): —q@)ov=x41 
r:— q(x) 


First, let us look at the behaviour of the operator ®: 


- ox Tœ would be the model extending N. where r is undefined and all the sentences 
{(z=n— q(z))"|n > 0} are true, so, the sentence Vz.q(z) will be evaluated as true 
in ox | @. This is not a fixpoint since we can iterate once more, to obtain a different 
model ® f (@+ 1) where ar is true. 

— In contrast, if we consider any non-standard model M elementarily equivalent to 
N., the sentence Vz.q(z) will be evaluated as undefined in or To, so, no more 
consequences will be obtained if we iterate once more. 


Now we can compare with the behaviour of T : 


Similar to the first case, go Tœ is the class of =-equivalent models extending 
EQ(N), where r is undefined and all the sentences 


{=n > q(z))"|n> 0} 


are true. But now, this is a fixpoint in contrast to what happens with any other operator 
working over just one standard model. In particular, it is not difficult to see that the 
sentence V/z.q(z) is never satisfied (by models) in [or Tk] for any k. This is because 
we are considering also non standard models (as the predicative extension of the above 
M ) at each iteration. Therefore, as a consequence of the definition of |_|, we have that 
Yz.q(z) is not satisfied in 

TO ro =| JOX 1H] 
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Finally, as a consequence of the continuity of T,°””"" , we can extend a result from 
Stuckey [19] related to the satisfaction of the logical consequences of the completion in 


any ordinal iteration of Dr“ , until the œ% iteration of 7, pomx (its least fixpoint): 


Theorem 14 (Extended Theorem of Stuckey) 
Let Th(Domx ) be the complete theory of Domx. For each &-goal lac: 














1. P*UTh(Domx) |=; (c > 2)” & VA € ga Mx to: a((c=> “=t 
2. P*UTh(Domx) 3 (c > -8Y & Va E€ qoo" to: A((c > 78)")=t 





5 Functorial Semantics 


As introduced in Subsection[2.3] one basic idea in this work is to formulate the construc- 
tions associated to the definition of the operational, least fixpoint and logical semantics 
of constraint normal logic programs in functorial terms. This allows us to separate the 
study of the properties satisfied by these three semantic constructions, from the classic 
comparisons of three kinds of semantics of programs over a specific constraint domain. 
Moreover, once the equivalence of semantic constructions is (as intended) obtained, the 
classical completeness results that can be obtained depending on the relations among 
solvers, theories and domains, are just consequences of the functorial properties. 

Comparing these semantic functors is not straightforward since, intuitively, their 
domains and codomains are different categories. We can see the logical semantics of 
a (Èx, £x )-constraint logic program P as a mapping (a functor), let us denote it by 
LOG p, whose arguments are logical theories and whose results are also logical theo- 
ries. The algebraic semantics of P, denoted 4 £ G p, can be seen as a functor that takes as 
arguments logical structures and returns as results logical structures. Finally, the oper- 
ational semantics of P, denoted O? p can be considered to take as arguments constraint 
solvers and return as results (for instance) interpretations of computed answers. 

We solve this problem by representing all the semantic domains involved in terms 
of sets of formulas. This is a quite standard approach in the area of Logic Program- 
ming where, for instance, (finitely generated) models are often represented as Herbrand 
structures (i.e., as classes of ground atoms) rather than as algebraic structures. One 
could criticize this approach in the framework of constraint logic programming, since 
a class does not faithfully represents a single model (the constraint domain of com- 
putation Dom, ) but a class of models. However, we have argued previously that, when 
dealing with negation, a constraint domain of computation should not be a single model, 
but the class of models which are elementarily equivalent to Dom, . In this sense, one 
may note that a class of elementarily equivalent models is uniquely represented by a 
complete theory. However, since we are dealing with three-valued logic, we are going 
to represent model classes, theories and solvers as pairs of sets of sentences, rather than 
just as single sets. 

In what follows, we present the categorical setting required for our purposes. Being 
more precise, first of all, we need to define the categories associated to solvers, compu- 
tation domains and theories (axiomatizable domains). Then, we will define the category 
which properly represents the semantics of programs. Finally, we will define the three 
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functors that respectively represent the operational, logical and algebraic semantics of 
a constraint normal logic programs. 


Definition 15 Given a signature Xy, a Xx -pre-theory M is a pair of sets of Xx -sen- 
tences (M+, M£). 


Remarks and Definitions 16 


1. Given a solver solvx of a given language Lx of Xx -constraints, we will denote by 
Msoly, the pre-theory associated to solv x, i.e., the pair (MŁ, M£) where My is 
the set of all constraints c € Lx such that solvx (c) = T and Mẹ is the set of all 
constraints c € Lx such that solvx(c) =F. 

2. Similarly, given a set of axioms Axx of a given language Lx of Xx -constraints, we 
will denote by Max, the theory associated to Ax x. 

3. Finally, given a computation domain Dom, of a given language Ly of Xx- 
constraints, we will denote by Mpom, the theory associated to Domy, i.e., the pair 
(MŁ, Mg) where M+ is the set of sentences satisfied by Domy and Mẹ is the set 
of sentences which are false in Domy. Note that, since constraint domains are typ- 
ically two-valued, M+ would typically be a complete theory and, therefore, M¢ is 
the complement of My. 


For the sake of simplicity, given a pre-theory M , we will write M (c) = t, to mean 
c E Mt; M (c) = f, to mean c € Mg; and M (c) =u, otherwise. 


Now, according to the above ideas, we will define categories to represent constraint 
solvers, computation domains and domain axiomatizations. Also, following similar 
ideas we are going to define a category of semantic domains for programs. In this case, 
we will define the semantics in terms of sets of formulas. However, we will restrict 
ourselves to sets of answers, i.e., formulas with the form c — G, where G is any goal. 


Definition 17 (Categories for Constraint Domains and Program Interpretations) 
Given a signature Xx we can define the following categories: 


1. The category of Xx -pre-theories, PreThy (or just PreTh if Xx is clear from the 
context) is defined as follows: 
— Its class of objects is the class of Xx -pre-theories. 
- For each pair of objects M and M' there is a morphism from M to M', noted 
just by M Se M', if My CM’: and Mg CM's 
2. Thy, (or just Th) is the full subcategory of PreThy whose objects are theories. 
3: CompTh, (or just CompTh) is the full subcategory of PreT hy. whose objects are 
complete theories 
4. Given a constraint language Lx and a signature È extending Xx, ProgInt 2s 


(or just ProgInt if X,Xy and Ly are clear from the context) is the category where: 


— Its objects are sets of sentences (c — 0)" or (c + 70)", where c € Ly and Z is 
a conjunction of &-literals. 
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- For each pair of objects A and A’ there is a morphism from A to A’, noted just 
bya xa’ ifaca' 
e Yy CÈ; and 
o for each (FS, ,PS x UPS)-literal €(X) and £x -formula c(%), A((e > @)") = 
t implies a'((c > DY) = t, and A((c > -€)”) = t implies a'((c > 
0)") = t. 


As pointed out before, this categorical formulation allows us to speak about rela- 
tions among solvers, domains and theories by establishing morphisms among them in 
the common category PreTh, in such a way that the morphism between two objects rep- 
resents the relation “agrees with” (or completeness if they are seen in the reverse sense). 
To be more precise, given a constraint domain X = (Èx , £x ,Axx,Dom, ,solvx ), we can 
reformulate the conditions (in Section[2. 1) required among solvy, Dom x and Axx as: 


M solv x Xe M Axy Xe M Domy 


in PreTh. That is, since Domy must be a model of Axy, there is a morphism from 
M Ax, tO Mpom, . Moreover, since solvy must agree with Axx , there is a morphism from 
M solv, tO Max, . Then, by transitivity, solvx agrees with Dom, , so there is a morphism 
M solv, tO M Domy . In addition, we can also reformulate other conditions in these terms: 


— solvy is Axx-complete (respectively, Dom, -complete) if, and only if, Max, Se 
M solv; (respectively, Mpom, Se Msolvy )- 


— Axx completely axiomatizes Dom, if, and only if, Mpom, Xc M Axy, 80, as expected 
M Axy = M Domy « 


Finally, we will define the three functors that represent, for a given program P, its 
operational, its algebraic or least fixpoint, and its logical semantics. 


Definition 18 (Functorial semantics) Let P be a Ł-program. We can define three func- 
tors OP p : PreTh — ProgInt, A LG p : CompTh — ProgInt and L OG p: Th —> ProgInt 
such that: 


a) OPp, ALG pand LOG p assign objects M in its corresponding source category to 
objects in ProgInt, in the following way 


1. Operational Semantics: 


OP p(M)={(c >DY | (M (a) Æ £) and there is a BCN(P,™ ) — derivation 
for Car with computed answer d such that M ((c > dY) =t}U 
{(c => 70)” | loc is a BCN (P, M ) — failed goal} 














2. Least Fixpoint Semantics: 





(M(E) #2) ATH TOK e> DYU 
| (at (c3) ££) AT toH (c> 0} 
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3. Logical Semantics: 





LOG p(M) = {(c > 2)" | (M (P) # £) AP*UTh(M ) = (c > 2)" JU 
{(c > 78)" | (M (a) # £) AP*UTH(M) E (e > 70)"} 





b) To each pair of objects M and M' such that M =, M' in the corresponding 
source category, F E {ALG p,£L OG p} assigns the morphism F (M ) < F (M') in 
ProgInt. However, O ? p is contravariant, i.e., M <-M' in PreTh implies F (M') < 
F (M ) in ProgInt. 





It is easy to see that 4 LG p and LOG p are functors as a straightforward conse- 
quence of the fact that morphisms are partial orders and the monotonicity of the opera- 
tor Tp” and the logic, respectively. The contravariance of O? p is a consequence of the 
fact that the BCN-derivation process only makes unsatisfiability queries to the solver 
to prune derivations. This means that when ™ ¢ is larger the derivation process prunes 
more derivation sequences. ~ 

Now, given a (Èx , £x )-program P, we can define the semantics of P as 


[P] = (OP p,ALG p, LOG p) 


6 Equivalence of Semantics 


In this subsection, we will first prove that the semantic constructions represented by the 
functors OP p, ALG p and LOG p are equivalent in the sense that for each object M in 
CompTh, Of p(™ ), ALG p(™ ), and LOG p(M ) are the same object in ProgInt. 

Then, we will show the completeness of the operational semantics with respect 
to the algebraic and logical semantics just as a consequence of the fact that functors 
preserve the relations from its domains into its codomains. 





Theorem 19 Let P be a X-program. For each object M in CompTh, 


OP pM) = ALG p(M) = L0G p(9) 


Finally, we present the usual completeness results of the operational semantics that 
can be obtained when the domains, theories and solvers are not equivalent. As we 
pointed out before, these results can be obtained just as a consequence of working with 
functors. In particular, since M solv, Xc MDom, the contravariance of O? p implies that 


ALG p(Mpdom, ) Xc OP P(Msolv, ), and similarly for the logical semantics. That is: 


Corollary 20 (Completeness of the operational semantics) For any program P, OP p 
is complete with respect to ALG p and with respect to LOG p. That is, for each con- 
straint domain (Xx , Lx ,Axx,Dom, , solvx ): 


- AL G p(MDdom, ) Xe OP p(Msolvy ) 
= LOG p(Max, ) Xe OP P(Msolv, ) 
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Appendix 








Proof of Lemmal6] Let L be $1,41, €2, €2, 60c. Then, Ly = 01,02, 2, 60c ATE (41), k > 0, 
and solvx ((cA TP (£1))7) #F, and, L’ = f1, l2, %3 cATP (4) ^T (b2), k>0,K >Oand 
solvy ((c ATÈ (41) A TE (&2))?) #F. 
































Now, to construct the derivation L psow) L2 sovy) L" in which 4% is select first 








in Lı we choose Ly = 01,2), 22,230¢ A Th (£2) and L” = 01,0, Boc A T} (2) A TË (4). 
Since solvy ((c A TË (£1) A TE (42))7) #F, by the well-behavedness property of solv, 


we know that solvx ((cA T} (£&2))7) #F and solvy ((cA TẸ (£2) A TË (£1))?) # F. Hence, 
L > r.soy) L2 esov) L” is a valid BCN (P, solvy )-derivation. E 
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Proof of Theorem The proof follows by induction on the length, n, of the 
BCN(P, solv, )-derivation. The base step, n = 0, trivially holds. Assume that the state- 
ment holds for n’ < n. Now, to prove the inductive step, consider the BCN(P, solv, )- 
derivation 











L M (P.solvy) Li M (P,solvy) +++ YF (P.solvy) Ln-1 (P solvy) OC 





Since this is a successful derivation, each literal in L is selected at some point of 
the derivation. Let us consider the literal in L and suppose that it is selected in 
Li. By applying Lemma [6] i times we can reorder the above derivation to obtain the 
following one L ~~ (pso,) L) esov) pow) Ly _1 eor) OC, Such that £ is se- 
lected in L and c’ is a reordering of c. Assume that the selection rule R selects lit- 


eral £ when considering the singleton derivation L. From the induction hypothesis, 


























f wer -1 ; : 
there is another BCN (P, solv, )-derivation Le (psor,) OC", using the selection rule R’, 
where R’ selects literals as they are selected by the rule R when considering the 


























: í n—1 r i 
derivation L ~~» (pon) Ly ~ psr, )0C". So, c” is a reordering of c' and hence of c. Thus, 
L psow) Li esoe) «+» eor) Lp 1 Ser) OC” is the BCN (P, solvy )-derivation we 
were looking for. E 


Proof of proposition[8] We are going to prove that P* UTh(Axx) = (TP (4) > 0)” for 
each k € WN, since it is easy to see that the general case is a straightforward consequence 
of Definition] 

The proof follows by induction on k and it merely relies on standard syntactical 
properties of first-order logic. For the base case, k = 0, the proposition trivially holds. 
Assume that the statement holds for k’ < k. Assume TF (£) is satisfiable (if it is not 
satisfiable the proposition trivially holds). There are two cases: 





1. L= p(x). Then, applying twice the definition of TP, the first time for atoms and the 
second time for the conjunction of literals, we obtain the following: 


TEO) = VIETE) =V ICA ATE) 


i=l i=l j=l 








Now, from the induction hypothesis we have that, for all i € {1,...,m} and for all 
JE4l,..., ni}: 
P* UTh(Axx) = (TÈ (6) > &)” 





Then, it follows logically that, 


m ni m Ni 
P* UTh(Axx) = (V Ilia A TEG = Vla NG 
i=l j=l i=l j=l 








And, again, applying the definition of TE we obtain the following: 


Nj 


Idia N GY (1) 
j=l 


= 


P*UTh(Axx) = (TE (p) > 


i=1 
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In addition, by the completion of predicate p(X), we have that, 


PYUTH(Axx) H (V Ilia A Gi) = pay)” (2) 
= j=1 





Hence, by (1) and (2), we can conclude that 


P*UTh(Axx) = (Te PE) > pE), k>0 





2. l= ~p(¥). Then, TP (~p(¥)) = FP (p(¥)), and applying the definition of F we 
obtain the following: 


m . m ; , ni l 
Fe PE) = AW (rel v RE E) = AW Od v V FG) 
i=1 i=1 j=1 
Using the induction hypothesis we have that, for all i € {1,...,m}, 7 € {1,...,nj}: 


P*UTA(Axx) H (FE (C4) > 705)" 





Therefore, it follows logically that, 


m Ni m Ni 
P*UTh(Axx) E (A Y Od v V FELG) > Av Oiv V 26) 
i=l j=l i=l j=l 





Again, applying the definition of FP, we have that, 


P*UTh(Axy) E (FER) > Aone v V ai 3) 
i=l j=l 





Finally, we use the completion of the predicate p(X) to obtain: 


P*UTH(Axx) E (A YOE v V 764) > pte)” (4) 
i=l j=l 





Hence, by (B) and (4), we can conclude that 


P*UTh(Axx) = (FP (p(®)) > ap())", k>0 E 





Proof of proposition] To prove that (Domy/=, <) is a cpo, we show that each increas- 
ing chain {[.4;] }ier C Domy/=, [41] X ... < [An] < ..., has a least upper bound | ||An]. 
Let [4] be such that 4((c — ¢)”) = t iff, for some n, Ay((c — £)“) = t. Then, it is 
almost trivial to see that 








— for each n, [Ay] < [A] 
— for any other [8] such that [4,] < [8] for each n, [A] < [8]. 


Finally, it is trivial to see that [Ls] < [4] forall [a] € Doms/=. m 
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Proof of Theorem{Z2] First of all, 7, ue is monotonic as a consequence of the fact that 
Dp” is monotonic. Then, being 7, on monotonic, to prove that it is continuous it is 
enough to prove that is is finitary. That is: For each increasing chain {[4n]}nez, [a1] < 


bere: Aal S ae 


tp (UAn) < LTP” (lan) 


Let [4] = U[An] and [8] = gpx (L[An]) = [®p* (4)]. Let us assume B((c > 
£)”) = t. We have two cases: 


(a) If = p(X) then, by the definition of the operator Dp , we know there are (renamed 
versions of) clauses {p(X) :— & ,..., %4, 0di| 1 < i < m} in P and Dy-satisfiable 
constraints {ci |1<i<mA 1<j<nj} such that 

















e aló 4) =t | 

© Alle Vicien =i (A1rejen C} ^ ays 
In such a situation, by definition of |_|, we know that for each 1 < i < m and 1 < 
j < ni there is a [Ax] € {[An] |n € I} such that Ay ((ci => ey") = t. Then, since 
(Domy/=,~) is a cpo, we know that each finite sub-chain has a least upper bound 
in {[An] }ner. Let it be [4s]. In addition, since all models in Dy are elementarily 
equivalent we can state that 

© As((cj > fi") = 

© As((¢ > Vicicm Ai(N1<j<n; ci Adj))")=t 
Therefore, ®p* (4s)((c > p(X))”) = t so for all models C € [®p* (As)] we have 
that C((c = p(x))”) = t. Thus, by definition of |], this implies that for all C’ € 
Li[@®p* (an)] = UTP” ([An]) we have that c’(c > p(z)))" = t. 
The proof for £ = p(X) proceeds in the same way. That is, by the definition of 








(b 


ma 


the operator ®}*, we know that for each (renamed version) clause in {p(X) : 
- 4, Hodi] <i < m} = Defp(p(X))) there is a J; C {1,...n;} and Dy- 
satisfiable constraints {c} | 1 <i <m ^ j € Ji} such that 














© a(i >) 

© a((c > M<i<mYYi(V jez; ci Vadi) =t 
Again, by definition of |_|, we know that for each j € J there is a [a ;] € {[4n]|n € I} 
such that 4 ;((c; + 7€;)”) = t. Then, as a consequence of (Domy/=, <) being a 
cpo, and all models in Dy being elementarily equivalent, there is a class [As] in the 
chain such that 


e all> 
© As((c > Ni<i<mYYi(V je; ci, V ad) "St 


Therefore ®>* (.4s)((c + ap(X))”) = t so, for all models C € [®P* (4s)] we have 


that C((c + —p(x))”) = t. And, finally, by definition of |], this implies that for all 
D 


Cl € [DRE (an)] = L72” ([An]) we have that c'(c >=p@)) =t. m 
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Proof of Theorem[{I4| We prove that 1 and 2 hold for a goal oc. Then, the general 
case for Cac easily follows from the logical definition of the truth-value of (c — £)“ and 
(c= 8)", 

The Stuckey’s result states that P* UTh(Dom, ) 3 (c — £)” if, and only if, 


























DP fke > O") =t 
for some finite k. So, by definition of 7, is , this is equivalent to 
VA e Tp" Tk: A((coO’)=t 
for some finite k. And, by definition of |_|, to 
va e| lope" tk: a((c > O")=t 


Proof of Theorem{19| First of all, we have that LOG p(M ) = ALG p(M ) as a direct 
consequence of Theorem[I4](Extended Theorem of Stuckey). We will prove that 


- ALG p(M) x OP p(M ) and 
- OPp(M) XLOGp(M) 


(a) To prove that A £ G p(M ) < OP p(M ) we use induction on the number of iterations 
of 7" . We just consider goals such that £ = p(X) and £ = —p(z), since the general 
case follows from the properties of operators TË and FP and the fact that BCN is 


independent of the selection rule. The base case n = 0 is trivial, since Tp" 10 = [Lz] 
and |Lz]((c — £Y) # t for all £x -constraint c(¥) and all 2-literal (7). 

Assume that for all k < n, T8“ f k((c — £)“) = t implies oP p(M )((c > AY) =t. 
If 2 = p(X) then, by the definition of 7", we know that there are clauses {p(F) : 
—b),...,€,,0d;| 1 <i < m} in P and M -satisfiable constraints {c; |\l<i<mA1< 
j < ni} such that 7" UG = Lae = t and 

















s 


m(e> VA Ad))) =t 
j=l 


i=1 


Then, by the induction hypothesis we have that OP p(M )( (ci > ey’) = t for all 
1<i<mand 1 < j< ni. Thus, there exist successful BCN(P, M )-derivations for 
each 1 <i < mand 1< j <ni: 


























Ë di ~pa) Ty (Cj) \ di 


such that M (CEG) Adj)*) # £ and M (ci, > TGD =t. 
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Let k > 0 be the largest number in {ki |lL<i<mA1<j<n;}. Then, as a 
consequence of the monotonicity of the operator TP, we know M (( Nhi TË (4) A 
d;)*) # £. And since 
ni nj 
PING) = NKG) 
j=l j=l 
and 


MN i >T A @))") =E 
j=l j=l 


we have that 
m 


(e> VITEN &) Ad”) = 


i=l j=l 
That is, M (Tf, (p@))2) # £ and 





M ((c > Ti P) =t 
Therefore, we can guarantee the existence of a successful BCN (P, M )-derivation: 


PR)Ot ea OTe (PH) 


such that OP p(M )((c > p@)") =t 
The proof for l = —p(X) proceeds in the same way. That is, according to the def- 
inition of the operator 7p” , we know that for each (possibly renamed) clause in 
{p(X) :— &),...,0,0di | 1 < i < m} = Defp(p(z))) there is a Jj C {1,...nj} and 
M -satisfiable constraints {ó |1<i<m A^ j € Jj} such that: 

© TM Tn((c, > L) 

© M ((¢ > NicicmV¥i(V jes, CV =) = t£ 
Again, by the induction hypothesis we have that for all 1 < i < m and j € Jj, 


OPp(M (c = ati”) = t so, for some ri >0 






































M ((ch + FEE =E 


Let r > 0 be the largest number in {ri |1<i<mA j €J;}. Then, as a consequence 
of the monotonicity of the operator F?, we know M ((V je, Fe) Æ £. And, 


since F? (V jeg, Cj) = V jer Fr (Gj) and M ((V jeg, $ > FP (V jeu, €)))”) = t we have 
that 


M ((c > Fry (p(@)))") = 
Therefore, we can guarantee that p(X)ac is a BCN(P,™ )-failure, so 
OP p(M )((c > >p())") =t. 
Finally, we prove that OP p(M ) < LOG p(M ). Again we have two cases: 
(i) Suppose that Of p(M )((c + -0)”) = t so, Cac is a BCN(P,™ )-failed goal. 
Hence, M ((c > FP (0))”) = t, for some k > 0. Therefore, by Proposition [8] 
we can conclude that P* UTh(M ) = (c > 70)". 
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(ii) Suppose now that 0? p(™ )((c > 2)’) = t. Again we will prove the case £ = 
p(X) since the general case will follow from the properties of TE and the fact 
that BCN is independent of the selection rule. So we assume p(¥)oc has a 
BCN(P,™M )-derivation 





























P(RaL ea oTe (PE) 








such that M ((c > T? (p@)))“) = t. Then, again as a consequence of Proposi- 
tion[8] we can conclude that P* U TA(M ) H (c > p(X))”. a 
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Abstract. We introduce a mathematical framework for black-box soft- 
ware testing of functional correctness, based on concepts from stochastic 
process theory. This framework supports the analysis of two important 
aspects of testing, namely: (i) coverage, probabilistic correctness and 
reliability modelling, and (ii) test case generation. Our model corrects 
some technical flaws found in previous models of probabilistic correct- 
ness found in the literature. It also provides insight into the design of 
new testing strategies, which can be more efficient than random testing. 


1 Introduction 


Structural or glass-box testing of the functional correctness of software systems 
has been theoretically studied at least since the early 1950s (see for example 
[Moore 1956]). Although many useful structural test strategies have been de- 
veloped, (see for example the survey [Lee and Yannakakis 1996]), theoretical 
studies clearly indicate the limitations of structural testing. In particular the 
complexity of structural testing techniques often grows exponentially with the 
size of the system. 

To overcome this limitation, software engineers use black-box testing meth- 
ods for large systems (see e.g. ). We will assume that a functional 
requirement on a system S can be modelled as a pair (p,q) consisting of pre- 
condition p on the input data and a postcondition q on the output data of S. 
The simplest strategy for black-box testing is random testing, in which input 
vectors satisfying p are randomly generated, and the output of each execution is 
compared with the postcondition q as a test oracle. 

Efforts to improve on random testing, for example by careful manual design 
of test cases based on system knowledge and programming expertise, face the 
problem of proving their cost effectiveness. In fact to date, a general theoretical 
framework in which different black-box test strategies can be compared seems to 
be lacking. Constructing such a theory is a challenging problem for the theoreti- 
cian. Not least because when structural features of a system S are hidden, much 
less remains on which to build a mathematical model. Essentially we have only 
the pre and postconditions p and q and the semantics or black-box behaviour 
of S. 
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In this paper we introduce a mathematical foundation for black-box testing 
of functional correctness based on the theory of stochastic processes. Our ap- 
proach is sufficiently general to deal with both: (i) coverage, test termination 
and reliability modelling, and (ii) efficient test case generation. These two issues 
seem to be central to any improvement in software testing technology. 

A system under test is invisible under the black-box methodology. Thus we 
can view its output as a source of stochastic behaviour under a sequence of tests. 
Our approach exploits the analogy between: 


(i) a black-box test which can be viewed as an input/output measurement made 
on an unseen and randomly given program, and 
(ii) a random variable which is a measurable function between two o-fields. 


Thus we can study how different strategies for black-box testing involve dif- 
ferent assumptions about the finite-dimensional distributions (FDDs) of such 
random variables. While exact calculations of probabilities are generally diffi- 
cult, it seems possible to identify heuristics which simplify and approximate 
these calculations. These can form the basis for new test strategies and tools. 

The organisation of this paper is as follows. In Section 2 we formalise the 
concept of test success for a program S with respect to a functional correctness 
requirement {p}S{q}. In Section 3, we introduce the necessary concepts from 
measure theory and the theory of stochastic processes which are used to formalise 
probabilistic statements about testing. Our model corrects some technical flaws 
found in previous models of probabilistic correctness in the literature. In Section 
4, we consider the coverage problem and termination criteria for testing. In 
Section 5 we consider efficient test case generation. Section 6 considers open 
problems and future research. 


2 Logical Foundations of Functional Black-Box Testing 


In this section we formalise functional black-box testing within the traditional 
framework of program correctness. The principle concept to be defined is that 
of a successful black-box test for functional correctness. 

To simplify our exposition, we consider requirements specifications and com- 
putation over the ordered ring Z of integers. It should be clear that our approach 
can be generalised to any countable many-sorted data type signature X, (see for 
example [Loeckx et al. 1996]) and any minimal X algebra A. 

The first-order language or signature ©'’"9 for an ordered ring of integers 
consists of two constant symbols 0, 1, three binary function symbols +, *, — 
and two binary relation symbols =, <. The ordered ring Z of integers is the first- 
order structure with domain Z where the constant, function and relation symbols 
are interpreted by the usual arithmetic constants, functions and relations. 

Let X be a set of variables. The set T(2%, X) of all terms is defined 
inductively in the usual way, and T(2'"9) = T(7""9, Ø) denotes the subset 
of all variable free or ground terms. If a : X — Z is any assignment then 
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a:T(5’, X) — Z denotes the term evaluation mapping. For any ground term t 
we may write tz for a(t). 

We assume the usual definition of the set L(9""9, X) of all first-order for- 
mulas over "9 and X as the smallest set containing all atomic formulas (equa- 
tions and inequalities) which is closed under the propositional connectives A, ~ 
and the quantifier Y. The expression ¢ V w denotes =(-¢ A ~y) while ¢ —> Y 
denotes = V w. The set Lui, o(2""9, X) of all infinitary first-order formulas 
over "9 and X extends L(X'’"9, X) with closure under countably infinite 
conjunctions 

Noi 


ie] 

for I a countable set. (See e.g. .) This infinitary language plays 
a technical role in Section 3 to translate first-order formulas into probabilistic 
statements about testing. 

We assume the usual definitions of a free variable in a formula or term, and 
the substitution of a free variable by a term. 

Next we recall how first-order formulas are used to define pre and postcondi- 
tions for a program within the framework of the Floyd-Hoare theory of program 


correctness. For an overview of this theory see e.g. |de Bakker 1980}. 


Definition 1. For any set X of variable symbols. define X' = {x' | cE X}. A 
variable x € X is termed a prevariable while x’ € X' is termed the corresponding 
postvariable. 

A precondition is a formula p € L(X™™9, X) with only prevariables, while 
a postcondition is a formula q € L(27"9, X UX’) that may have both pre and 
postvariables. 


Let N(X) be an arbitrary programming language in which each program w € 
Q(X) has an interface i(w) = T where T = 21, ..., £p E XF is a finite sequence 
of length k > 1 of integer variables. The interface variables x; are considered to 
function as both input and output variables for w. If w has the interface = we 
may simply write w[%]. Note that, consistent with black-box testing, we assume 
no internal structure or syntax for Q(X) programs. We will assume semantically 
that each program w[%] € R(X) has a simple deterministic transformational 
action on the initial state of 71, ..., £k. In the sequel, for any set B, we let 
Bı = BU{1L} where L denotes an undefined value. We let f : A— B1 denote 
a partial function between sets A and B that may be undefined on any value 
a € A, in which case we write f(a) = L. If f(a) is defined and equal to b € B 
we write f(a) = b. We use [A — B1] to denote the set of all partial functions 
from A to B. 


Definition 2. Let Q(X) be a programming language. By a semantic mapping 
for Q(X) we mean a mapping of programs into partial functions, 


[.J:U (4325) 


k>1 
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where for any program w E€ 2(X) with interface i(w) = £1, ..., £k 
[w]: ZE Zk 
is a partial recursive function. 


Intuitively, for any input a = a1, ...ap € Z*, either w fails to terminate when 
the interface variables z1, ..., £k are initialised to a1, ...a, respectively, and 
[w ](a) = L, or else w terminates under this initialisation, and |w ](a) = 
bi, ... bk, where b; is the state of the interface variable x; after termination. 

Let us collect together the definitions introduced so far to formalise the con- 
cepts of test success and failure. Recall the usual satisfaction relation Z, aœ = @ 
in Z for a formula ¢ (finitary or infinitary) under an assignment a: X — Z in 
the Y"9 structure Z. 


Definition 3. Let w € Q(X) be a program with interface i(w) = z1, ..., £k E 
XEF. 

(i) A functional specification {p}w{q} is a triple, where p € L(X°™9, X) is a 
precondition and q € L(X"™™9, X UX") is a postcondition. 


(ii) A specification {p}w{q} is said to be true in Z under assignments a: X > Z 
and b : X' — Z if, and only if, Z, a = p and if | w ](a(a1), ...a(a~) ) = 
B(x), ..., D(x) then Z, aUb FE q. If {p}w{q} is true in Z under a and b we 
write 








Z, aUb = {p}w{q}. 
We say that {p}w{q} is valid in Z and write Z = {p}w{q}, if, and only if, for 
every a : X — Z if there exists b : X' > Z such that | w ]( a(xı), ...a(zk) ) = 
b(x1), ..., b(v),) then Z, a Ub = {p}w{q}. 











(iii) Let p be a precondition and q be a postcondition. For anya: X —> Z we say 
that w fails the test a of {p}w{q} if, and only if, there exists b : X' > Z such 
that Z, a = p and | w ]( a(x1), ...a(£k) ) = d(x), ---, D(x) and 


Z, aUb q. 








We say that w passes the test a of {p}w{q} if w does not fail a. 


Intuitively w fails the test a: X — Z of {p}w{q} if a satisfies the precon- 
dition p, and w terminates on the input a but the resulting output assignment 
b: X’ — Z does not satisfy the postcondition q. This definition is consistent with 
the partial correctness interpretation of validity for {p}w{q} used in Definition 
3.(ii) (c.f. [de Bakker 1980]). Partial correctness (rather than the alternative to- 
tal correctness interpretation) is appropriate if we require a failed test case to 
produce an observably incorrect value in finite time rather than an unobservable 
infinite loop. In particular, for this choice of Definitions 3.(ii) and 3.(iii) we have 





Z 4 {p}w{q} <= w fails some test a of {p}w{q}. 


Thus we have formalised program testing as the search for counterexamples 
to program correctness (under the partial correctness interpretation). 
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3 A Stochastic Calculus for Program Correctness 


Within the framework of program correctness outlined in Section 2, we wish 
to approach functional black-box testing in a quantitative way. The following 
question seems central. Given a specification {p}w{q} suppose that w has passed 
n tests a1, ..., An of p and q: for any new test an+ı what is the probability that 
w will fail an41? Intuitively, we expect this failure probability to monotonically 
decrease as a function of n. For fixed n we also expect the failure probability to 
depend on the values chosen for a1, ..., an. An optimal testing strategy would 
be to choose an+ı with the maximal probability that w fails an41, for each n > 0. 
In this section we will introduce a probabilistic model of testing that can be used 
to answer this question. 

Recall from probability theory the concept of a o-algebra or o—field (2, F) of 
events, where (2 is a non-empty set of elements termed outcomes and F C (£2) 
is a collection of sets, known as events, which is closed under countable unions 


and intersections. (See e.g. |Kallenberg 1997|.) Importantly for us, ( A, (A) ) is 


a o-field, for any countable set A, including any set of partial recursive functions. 


Definition 4. Let T = 2, ..., a, E X*, for k > 1, be an interface. By a 
sample space of programs over T we mean a pair 


OF] = ( Q[T], eval ), 


where Q(%] C Q(X) is a subset of programs wT] all having the same interface 
T, and eval : Q(z] x Z* — ZE is the program evaluation mapping given by 


eval( w, ai, ..., ak )= [w ](a, ..., ak). 


We say that Q[T] is extensional if, and only if, for all programs w, w' € Q[z] 


( Va € Z? eval(w, a) = eval(w’, a) ) > w=w. 
We consider extensional sample spaces of programs only. It is important for 
distribution modeling that all programs w € N[T] have the same interface 7. 
Recall that a probability measure P on o-field (Q, F) is a function P : F —> 
(0, 1] satisfying: P(@) = 0, P(2) = 1, and for any collection e1, e2, ... € F of 
events which are pairwise disjoint, i.e. i A j > ei N ej = Í, 


P( Yer) =>) PC er). 


The triple (2, F, P) is termed a probability space. 
Let (2, F) and (’, F’) be o-fields. A function f : R — Q is said to be 
measurable, if, and only if, for each event e € F”, 


fle) EF. 
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Let P = (2, F, P) be a probability space. A random variable X : Q — Q' isa 
measurable function. If X : R — Q’ is any random variable then X induces a 
probability function Px : F’ — [0, 1] defined on any e € F’ by 


Px(e) = P( X~} (e) ). 


Then (’, F’, Px) is a probability space. Thus one important role of a random 
variable that we will exploit in Definition 9 is to transfer a probability measure 
from F-events onto F’-events. 

We may take more than one measurement on the outcome of any random 
experiment. We can consider any finite number or even an infinite number of 
measurements. This leads us naturally to the important concept of a stochastic 
process. Let I be any non-empty set. A stochastic process S over (Q', F’) is 
an I-indexed family of random variables S = (S; : RQ — Q’ | i € I). For each 
w E 92, the function S(w) : I > Q defined by 


S(w)(t) = Siw) 
is termed a path of the process S. (See e.g. [Grimmet et al. 1982].) 


Definition 5. Let T = £1, ..., x, € XF, for k > 1, be an interface and let 
QZ] be a sample space of programs. Define 


iz = {x1, ..., £k} x ZF 
to be an indexing set for a family of random variables. For each index 
(£i, a1, ---, ak) € &, 


define the random variable S(x;, a1, ..., ap) : OE] > Za by 


a if eval( w, a1, ..., ak) =L, 
S(as, Al, sey ap) (w) = [ i 
eval( w, ai, ..., ak Ji, otherwise. 
Thus S(2,, a1, ..., ap) (W) gives the output obtained for the interface variable x; by 
executing program w on the input a1, ..., ax. 
A path S(w) : {a1, ..., £k} x ZE — Z, for the stochastic process S = 
( Si | i € &) gives the entire input/output behaviour of the program w. 
Definition 5 exploits the analogy between: (i) a black-box test which can be 
viewed as an input/output measurement made on an unseen and therefore essen- 
tially randomly given program, and (ii) a random variable which is a measurable 
function between two o-fields. A test sequence is just a sequence of such in- 
put/output measurements made on the same unseen program. Therefore, it is 
natural to model all possible tests on the same program as a path of a stochastic 
process. Hence we arrive at the model given by Definition 5. 
Modelling programs as stochastic processes in this way now makes it possible 
to derive probabilistic statements about test success and failure. 
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In the remainder of this section, we let ( Q[%], F, P ) denote an arbitrary 
probability space, where Q:[Z] = ( Q[Z], eval ) is a countable extensional sample 
space of programs, and F = g(2[Z]). In order to assign probabilities to pre 
and postconditions on any program w € 2[%] we need to be able to represent 
these as events (i.e. sets of programs) within the o-field F. For this we begin by 
showing how first-order terms can be analysed in terms of the random variables 
introduced in Definition 5. 

Now Z is a minimal ©'""9 structure. So by definition, for any integer i € Z 
there exists a canonical numeral 7 € T(’’"9) which denotes i in Z, i.e. iz = i. 


Definition 6. For any assignment a: X — Z, we define the translation map- 
ping 
at: T(t, XW tah ec PY > T(S, Iz) 
by induction on terms. 
(i) a? (0) = 0 and a¥(1) = 1. 
(ii) For any variable x€ X, a*(x) = a(x), 


(iii) For any postvariable x’ € {x}, ..., £4}, 


at(x') = ( x, a(zı), ..., a(zp) ). 


(iv) For any terms ty, tg € T(X™9, X U X’), and for any function symbol 
op € {+, *, =} 
at ( tı op tg ) = ( a*(t,) op aË (t2) ). 


In essence a? replaces each logical variable and prevariable with the name of its 
value under a. Also aË replaces each postvariable with the index of its corre- 
sponding random variable under a. 

In order to translate first-order formulas into events we extend aë to 
all first-order formulas by mapping these into the quantifier-free fragment of 
Loi, w(©, Ir), in which even bound variables have dissappeared. 


Definition 7. For any assignment a: X — A, we define the translation map- 
ping 


al LEI Ig oe Sy I Lui gol Se a) 
by induction on formulas. 
(i) For any terms tı, t2 € T( U9, XU{a}, ..., £4} ) and any relation symbol 
Re{s<, =}, 


at ( ty R t2 ) = ( aË (tı) R a* (t2) p 
(ii) For any formulas ġ1, ¢2 € L( X™9, X U {x4, ..., £4}, 


a*( $1 A ¢2 ) = ( aë (1) A aè (ġ2) ) 
a*( 741 ) = 7( a*(d1) ) 
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(itt) For any variable x € XU{ax}, ..., x, }, and any formula ¢ € L( "9, XU 
{Ti +--+) Th), 
al( vad) = N ale > if( ox/2] ) 
ieZ 
where z E€ X — {x1, ..., Un} is the least (non-interface) variable (under a fixed 
enumeration) such that z is not free in ọ and alz — i] : X — A agrees with a 
everywhere except on z, where alz > i|(z) = i. 


Now we can easily associate an event consisting of a set of programs with every 
quantifier free formula ¢ € Lwi, w( 2", Tẹ ) as follows. 


Definition 8. Define the event set §(d) C Q[Z] for each quantifier free formula 
Q E Lur, w( 29, Ee) by induction on formulas. 
(i) For any terms tilii, -.., im], ta[t1, ---, im] E T( 2", Tz ), and any rela- 
tion symbol R € {<, =}, 

S(t Rig) )= 


( Sis sey ie. (ibe Zz" | Z, bE ty Rt }). 
(ii) For any quantifier free formulas 61, ¢2 € Lan, w( 2", Ir), 
T. A1 A Q2 ) = (41) N F(G2) 
&( 741 ) = QZ] — F(¢1) 


(iii) For any countable family of quantifier free formulas 
(Qi E Lur, o 2, œ) | tel), 


TA 6) =N B( 4). 


tel icl 





Notice in Definition 8.(i) we assume that {b € Z™ | Z, b & ty R tə } is in- 
deed an event on Z™. In the case that we take the discrete o-field p(Z™) this 
requirement is trivially satisfied. 

Using Definition 8 we can now translate the probability distribution P on 
programs into probability values for correctness statements. Thus we come to 
the central definitions of this section. 





Definition 9. Let ọ € L(X™9, X U {x1, ..., x, }) be any formula. 
(i) We define the probability that @ is satisfiable under a: X — Z by 


P( Sata($) ) = P( E( a*(¢) ) ). 
(ii) We define the probability that ọ is satisfiable by 


P( Sat(¢))=P( LJ) &(a*))). 


a:X>Z 
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(i) We define the probability that o is valid by 


P(ZE¢)=P( (} 8 a4) )). 


a:X >Z 





Definition 9.(i) provides a rigorous mathematical answer to the initial ques- 
tion this section. It is helpful to illustrate this definition with some simple ex- 
amples. 


Example 1. Let x € X be a single variable interface. 
(i) Consider the specification 


{ }w{e! =meatc}, 


which asserts that w[x] computes a linear function f(x) = ma + c of its input 
variable x. For any input assignment of a € Z to x 


Fl atl =mx*at+c))= 
B( (x, a)=mxa+c )= 
( Sæ a {EZ |Z, bE (z, a)=mxa+c}= 
{w € Qa] | eval(w, a) = mxa + c}. 





Thus 
P( Sata a2’ =m*z+c ))= 


P( {w € Q|x] | eval(w, a) Amxatc} ) 


is the probability that a randomly chosen single variable program w|z] € [{:] 
fails the postcondition x’ = m * x + c on the test input a € Z. 

(ii) More generally, the probability that a program w[2] will fail a test az41 € Z 
of p and q given that w has already passed k tests a1, ..., a, E Z with output 
bi, ..., bk € Z is the conditional probability 


P( Sata (PAG ) | {w E Q: eval(w, ai) =b; for 1 <i< k} ). 


Definition 9 satisfies several intuitive properties. 























Proposition 1. Let ¢, y € L( Y""9, X U {x1, ..., x} ) be any formulas. 
(i) FZE ¢ thenP(ZE@)=1. 

(ii) IfZ Eo then P(ZE¢) =0. 

(#) FZ 63d then ZERO) <P(ZEW). 

(iv) P(Z = ¢ ) =1—P( Sat(>¢) ). 


(v) ZH y thnP(ZE¢)=P(ZEV). 


Proof. Follows easily from Definition 9. 

















A Stochastic Theory of Black-Box Software Testing 587 


By Proposition 1.(v) the probability that a correctness formula is valid is 
independent of its syntactic structure and depends only on its semantics. 

Although these properties are intuitive, they are not satisfied by any of the 
reliability models of [Hamlet 1987], or [Thayer et al. 1978). 
For example, these models all assign a probability p < 1 of satisfying a tautology. 
We believe this points to a significant conceptual flaw in existing models of 
probabilistic correctness in the literature. 


4 Test Coverage and Software Reliability 


Given that n tests of a program w[Z] are unsuccessful in finding an error in w, 
what is the probability that w satisfies a specification {p}w{q}? If it is possible 
to calculate or even estimate this probability value, then we have a clearly de- 
fined stopping criterion for black-box testing: we may terminate when a desired 
probability of correctness has been achieved. Thus the concept of probability of 
correctness gives a formal model of black-box test coverage, where by coverage 
we mean the extent of testing. We shall apply the theoretical model introduced 
in Section 3 to consider the problem of estimating the probability of correctness. 

An obvious technical problem is to find a distribution P on programs which is 
realistic. However, to begin to study coverage and the testing termination prob- 
lem we can use very simple heuristical probability distributions, and examine 
how calculations can be made. 

For simplicity, we consider programs with a single integer variable interface 
x EX. Let 

Qfx] = ( Q[z], eval ). 


Also, for simplicity, we assume that |x] is a subrecursive language, i.e. each 
program w[x] € R[x] terminates on all inputs. Thus eval : R[x] x Z — Z is also a 
total function, which allows us to work with totally defined random walk models 
of paths. 

To calculate the probability of satisfying a formula ¢, we need a probability 
distribution P : o( Q|z] ) — [0, 1] Recalling Definition 5, one approach is to 
consider the associated family of random variables 


S = ( Sia, i) : Q[x] > Z |ieEZ). 


A simple model of the FDDs of these random variables is to assume a random 
walk hypothesis. Writing S; for Ste, i) we can relate S, and S,41 by a formula 
Sn+1 = Sn + Xn where 


( Xi: Qj] >Z | ieZ) 


is another family of random variables. A simple relationship is to define an 
exponential distribution on the X; by 








P(X; = +n) = P(X: = —n) TON 


for all n € N. 
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An exponential random walk over a finite interval can model any total func- 
tion over that interval. This distribution captures the simple intuition that fast 
growing functions are increasingly unlikely. Furthermore it is easily analysed, 
and for simple formulas, we can estimate the probability of satisfiability. After 
n test passes on the inputs ay < ag < ... < an E€ Z we have a sequence of 
n—1 intervals [a;, aj+1]. We can consider the probability of satisfying a formula 
pA -7q (where p is a precondition and q is a postcondition) over each of these 
intervals separately. To perform such an analysis, we first note that the interval 
lai, @i41] can be renormalised to the interval [0, aj+1 — aj] without affecting 
the probability values. (An exponential random walk is homogeneous along the 
x-axis.) Let us consider probabilities for the individual paths over an interval 
(0, al. 

The probability of a path following an exponential random walk is a function 
of its length and its volatility. 





Definition 10. Let y = yo, Yı, ---; Yn E ZVT! be a path of length n > 1. We 
define the volatility \(y) E Z of y by 


My) =X lyi- yi l: 
i=l 


Proposition 2. Let y= yo, Yı, ---, Yn E ZYH! be a path of length n > 1. Then 


1\”/1 A(y) 


Proof. By induction on n. 

Let us consider monotone paths. 
Definition 11. Let y = yo, Yı, ---, Yn E Z"* be a path of length n > 1. We 
say that y is monotone if yo < yı S< ... < Yn OT Yo Z Yı... > Yn. 


Proposition 3. Let y = yo, Yı, ---, Yn E ZH! be a path of length n > 1. 
(i) If y is monotone then A(y) = |yn — yol. 
(ii) If y is non-monotone then A(y) > |yn — yol. 


Proof. By induction on the length of paths. 


Thus it is easy to calculate the probability of a monotone path. 


Corollary 1. Let y = yo, Yı, ---, Yn E Zt be a monotone path of length n. 


Then 
W"/1 lyn—yol 
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Proof. Immediate from Propositions 2 and 3. 


So the probability of a monotone path under the exponential distribution is 
independent of the steps taken, and depends only on the start and end points. In 
fact, by Proposition 3, this property characterises the monotone paths. Further- 


more, any non-monotone path y from yo to yn has exponentially lower probability 


than any monotone path from yo to Yn by a factor (eee eel, 


Let C®(n, r ) be the number of ways of selecting a collection of r objects 
from a total of n objects with repetitions. Then 


C®(n, r)=C(n+r-—1,r) 
where C( n, r ) is the binomial coefficient defined by 


n(n —1)...(n—r+1) 
r! 


C(n, r)= 
Recall the well known upper negation identity (see e.g. [Graham et al. 1989]) 
Cin, r) =(-1)"C(r-n-1, 1r), 
from which we can infer C®(n, r ) = (—1)"C(—n, r ). 
Theorem 1. For any n > 1 and yo, Yn E Z, 
P( Sn = Yn | So = Yo) = 


1 n 1 lyn—yol $ 
(3) (5) (c (n, [Yn — yol ) + 
min(i, n—1) 


y > C(n, k) C®(k, i-k G?(n—k, TEMENO) | 


i>0 
Proof. Apply Proposition 2 and sum over all volatility values. 


To illustrate the approach, we estimate the probability of a single variable 
program w|z] failing a test of a simple linear equational specification 


{ jo{xr' = mx*xzr+c}, 


within an interval [0, n]. (Recall Example 1.) We may assume that w passes both 
tests of the endpoints 0 and n. 


Theorem 2. (i) For any n > 1 and any c E€ Z, 





=I 
P( Si £c for some 0<i<n| So= c, 8,0)» (25), 
n 


(ii) For any n > 1 and any m, c € Z, where m Æ 0, 
P( SiZ mi+c for some 0<i<n| S0o=c¢c, S,=mn+c) 


1 
CR(n, |nm| ) 


xli- 
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Proof. (i) Follows from Proposition 2 and Theorem 1 by considering non-mono- 
tone paths with highest probability satisfying S; 4 c for some 0 < i < n. 


(ii) Clearly, there is only one monotone path yo, ..., Yn of length n, satisfying 
y = mz +c, namely 


Y =Y =€, YI=M+C,..., Yn = NM FC: 


So for all monotone paths from c to nm + c excluding y, using Corollary 1 and 
Theorem 1, 


P(S,=nm+c, S4 mi+c for some 0<i<n | So=c)= 


Conia hali SG 


Hence the result follows. 


By a similar analysis of other types of correctness formulas, it becomes clear 
that closed form solutions to reliability estimation problems become intractable 
for anything other than simple kinds of formulas. For practical testing problems, 
Monte Carlo simulation (see e.g. [Bouleau_1994]) seems to be a necessary tool 
to estimate reliability after n tests, even for such a simple distribution as the 
exponential random walk. 

Clearly, the results of this section depend on specific properties of the expo- 
nential random walk. This distribution model represents a naive but mathemat- 
ically tractable model of reality. An open question for future research is to find 
more realistic models or program distributions. Of course, more accurate models 
would lead to slightly different results than those presented here. 


5 Test Case Generation (TCG) 


In section 4 we considered the problem of stopping the testing process with some 
quantitative conclusion about the reliability of a program after n tests have been 
passed. In this section we consider how to apply our stochastic model to the ac- 
tual testing phase that precedes termination. How can we use the stochastic 
approach to efficiently generate test cases that can effectively uncover errors? 
We have already seen in Section 4 that calculations of correctness probabilities 
may be computationally expensive. However, for certain kinds of probability 
distributions an approach to TCG can be developed from outside probability 
theory, using classical function approximation theory, with the advantage of ef- 
ficient speed. 

For clarity of exposition, we will again deal with the case of a program in- 
terface consisting a single input/output variable x € X. Furthermore, we will 
generalise from probability measures to arbitrary finite measures at this point. 
(Recall that every probability measure is a measure, but not vice-versa.) 
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Definition 12. Let M : | [m,n] > D ] > R* be a measure for D C Z. (If the 
codomain of M is [0,1] then M is an FDD.) Then M is elective if, and only if, 
for any (a1, bı), ..., (ak, bk) E [m,n] x D the set 


Fray, 61), 5 (az, be) = 1 9: [m,n] > D | glai) = bi for 1<i<k } 


has a unique maximum member under M, i.e. there exists f © Fray, b1), ..., (ax, br) 
such that for all g E€ Fray, 61), ..., (an, be) 


g#f = M(g)<M(f). 


To understand Definition 12, suppose that b1, ..., bk E€ D are the results of 
executing a program w|[z] on the test inputs a1, ..., ap € [m,n] respectively. 
Then an elective measure M gives for the input/output pairs 

(a1, bı), trey (ak, bx) 


a unique “most likely candidate” for a function f extending these pairs to the 
entire interval [m,n]. This candidate function f, which is the maximum member 
f E Fray, b1), ..., (ax, bp) Under M, represents a “best” guess of what the partially 
known system under test might look like in its entirety. 

An elective measure M : | [m,n] — D] — R* gives rise to an iterative 
test case generation procedure in the following way. Given k executed test cases 
a1, ..-, a E [m, n] for a program w/z] with results b1, ..., bk € D, we can 
consider the unique elected function fg € Fai, b1), ..., (az, bp) aS a model of w[z]. 
By analysing fp we may be able to locate a new test case ax41 € [m,n] such that 
ak+ı satisfies a precondition p but fk(ak+ı) does not satisfy a postcondition q. 
Then az41 is a promising new test case to execute on w/z]. If no such az41 exists 
we can use some other choice criteria (e.g. random) for the k + 1-th test, and 
hope for a more promising test case later as the sequence of elected functions 
fk : k > 1 converges to the actual input/output behaviour of w[]. 

The fundamental technical problem for this approach to TCG is to find a suit- 
able elective measure M. One pragmatic solution to this problem is introduced in 
using function approximation theory. Specifically, an interpolant 
(usually a local interpolant) of (a1, 61), ..., (ak, be) € [m,n] x D is chosen as 
the elected function. In piecewise polynomials were investigated 
as local interpolants. Many other classes of approximating functions are known 
in the literature such as splines, wavelets, radial basis functions, etc. Thus the 
technique gives a rich source of algorithms. 

Our main result in this section is to show that for a large class of approxima- 
tion methods, the function approximation approach (which is non-probabilistic, 
and fast to the extent that interpolants can be efficiently computed and evalu- 
ated) is equivalent to the measure theoretic approach. 


Definition 13. Let D C Z be any subset. An interpolation scheme is a mapping 
I: e([m,n] x D) > [ [m,n] — D ] such that for alll <i<k 


I( {(a1, bı), PEF (ak, bk)} \(ai) = bi. 
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We will show that a large class of interpolation schemes, including polynomial 
interpolation, actually give rise to elective measures, and even elective probabil- 
ity measures. Thus the approximation approach can be seen as a special case 
of the stochastic approach, where the FDDs are implicit, but can be efficiently 
computed. 


Definition 14. Let u : gq"! _, RE be a finite measure (not necessarily a 
probability measure). Let 


I: g([m,n] x D) > [[m,n] > D] 
be an interpolation scheme. Define the measure 
u! : [m,n] > DJ > Rt 


by 
Ct Sl { {a4, sey ai, } C [m,n] | 


I[ { (ai, f(a), sey (ais Flain)) } ] = f } J; 
Proposition 4. If D C Z is finite then u can be defined so that u! is a proba- 
bility measure for any interpolation scheme I. 


Proof. By construction. 


Definition 15. Let M : | [m,n] > D] —> R* be an elective measure. Define 
the interpolation scheme IM : p([m,n] x D) > [ [m,n] > D ] by 


PS) { (ao, bo), tees (ak, b) }] = F, 


where f € {g : [m,n] > D | glxi) = yi for 1< i< k} is the unique element 
for which M(f) is maximum. 


Definition 16. Let I : p([m,n] x D) > [[m,n] — D] be an interpolation 
scheme. 

(i) I is monotone if, and only if, I| { (ao, bo), ..., (ak, be) }] = f and 

{ (ap, bo), ---, (aj, bh) } C f and { ao, ..., ak } C{ ag, ..., a; } imply 


Il { (ao, bo), AEG (ai, b) }]= f. 


(ii) I is permutable if, and only if, for any { a1, ..., ak } C [m,n] and f : 
[m,n] > D if I| { (a1, f(ar)), ---, (ak, Flak)) }] = f then for any 
{ ai, ..., a } C [m,n] 


TT { (ah, f(a1))s -s (Ger Flax) F1 = F. 
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Example 2. Polynomial approximation is monotone and permutable. 
Theorem 3. Let I: g([m,n] x D) > [ [m,n] — D ] be a monotone permutable 
interpolation scheme. Then u? is elective and I! =]. 


Proof. Consider any {(a1, b1), .--, (an, bn)} C [m,n] x D and suppose 


I {(a1, bi), BP eg (an, bn)} l=f, 


then we need to show that 


I {(a1, 1), sei (any bn)} TSF. 


It suffices to show that for any g : [m,n] — D such that g(a;) = bi for 1<i<n 
and g Æ f, pi(g) < pw'(f), ie. u? is elective. Consider any such g and any 
{a}, ..., a} C [m,n] x D and suppose that 


I| {(a4, g(a1)), -<-> (ak; gla) } 1 = 9. 


Since I is permutable we must have k > n. 
Since J is an interpolation scheme 


TT {(a1, flar)), -++5 (an, Flan))} |= F. 


Then since I is permutable 


TL {(ay, F(a)) -s (an Fan) Taf 


Finally since J is monotone and k > n, 


TT {(a1, (ai), -s Cak F) l= F. 


Therefore 
{{a1, ---, ak} C [m,n] | I| {(a1, g(a1)), ---, (ak, glax))} ]= 9} S 


{{a1, ---, ak} C [m,n] | I[ {(a1, flan), +--+ (ax, Flan) |= F} 


Thus since p is a measure, u? (f) > u? (g), i.e. u? is elective. 


6 Conclusions 


In this paper we have introduced a stochastic model for black-box testing of the 
functional correctness of programs. This model allows us to derive a probability 
value for the validity of a correctness formula of the form {p}w{q} conditional on 
the results of any finite set of black-box tests on w. It corrects technical problems 
with similar models occuring previously in the literature. Our model provides a 
solution to the difficult problem of measuring coverage in black-box testing. It 
also suggests new approaches to the test case generation process itself. 
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Further research is necessary to establish accurate models of the probabilistic 
distribution of programs. Furthermore, we may generalise our model to consider 
how program distributions are influenced by the choice of the programming 
problem to be solved (the precondition p and postcondition q). This would give 
a theoretical model of the competent programmer hypothesis of [Budd 1980]. 
This also requires consideration of the difficult problem of non-termination. For 
example, it may be necessary to introduce a non-functional time requirement 
into specifications, in order to abort a test that can never terminate. Research 
into other abstract data types and concrete data structures also presents an 
important problem in this area. 

Much of this research was carried out during a sabbatical visit to the De- 
partment of Computer Science Engineering at the University of California at 
San Diego (UCSD) during 2003. We gratefully acknowledge the support of the 
Department, and in particular the helpful comments and advice received from 
Joseph Goguen and the members of the Meaning and Computation group. We 
also acknowledge the financial support of TFR grant 2000-447. 


References 


de Bakker 1980] J.W. de Bakker, Mathematical Theory of Program Correctness, 
Prentice-Hall, 1980. 

Barwise 1968] J. Barwise (ed), The Syntax and Semantics of Infinitary Languages, 
Lecture Notes in Mathematics 72, Springer-Verlag, Berlin, 1968. 

Bauer 1981] H. Bauer, Probability Theory and Elements of Measure Theory, Academic 
Press, London, 1981. 

Beizer 1995] B. Beizer, Black-Box Testing, John Wiley, 1995. 

Bouleau 1994] N. Bouleau, D. Lepingle, Numerical Methods for Stochastic Processes, 
John Wiley, New York, 1994. 

Budd 1980] Budd, T.A. DeMillo, R.A. Lipton, R.J. Sayward, F.G. Theoretical and 
Empirical Studies on Using Program Mutation to Test the Functional Correctness of 
Programs, Proc. 7th ACM SIGPLAN-SIGACT Symp. on Principles of Programming 
Languages, 220-223, 1980. 

Graham et al. 1989] R.L. Graham, D.E. Knuth and O. Patashnik, Concrete Mathe- 
matics, Addison-Wesley, Reading Mass., 1989. 

Grimmet et al. 1982] G. Grimmet, D. Stirzaker, Probability and Random Processes, 
Oxford University Press, 1982. 

Hamlet 1987] Hamlet, R.G. Probable Correctness Theory, Inf. Proc. Letters 25, 17-25, 
1987. 

Kallenberg 1997] O. Kallenberg, Foundations of Modern Probability, Springer Verlag, 
1997. 

Lee and Yannakakis 1996] D. Lee, M. Yannakakis, Principles and Methods of Testing 
Finite State Machines - a Survey, Proc. IEEE, 84 (8), 1090-1123, 1996. 

Loeckx et al. 1996] J. Loeckx, H-D. Ehrich, M. Wolf, Specification of Abstract Data 
Types, Wiley Teubner, Chichester 1996. 

Meinke 2004] K. Meinke, Automated Black-Box Testing of Functional Correctness 
using Function Approximation, pp 143-153 in: G. Rothermel (ed) Proc. ACM SIG- 
SOFT Int. Symp. on Software Testing and Analysis, ISSTA 2004, Software Engi- 
neering Notes 29 (4), ACM Press, 2004. 











A Stochastic Theory of Black-Box Software Testing 595 


Miller et al. 1992] Miller, K.W. Morell, L.J. Noonan, R.E. Park, S.K. Nicol, D.M. 
Murrill, B.W. Voas, J.M.: Estimating the Probability of Failure when Testing Re- 
veals no Failures, IEEE Trans. Soft. Eng. 18 (1), 33-43, 1992. 

Moore 1956] E.F. Moore, Gedanken-experiments on Sequential Machines, Princeton 
Univ. Press, Ann. Math. Studies, 34, 129-153, Princeton NJ, 1956. 

Thayer et al. 1978] Thayer, T.A. Lipow, M. Nelson, E.C.: Software Reliability, North 
Holland, New York, 1978. 

Weiss and Weyuker 1988] Weiss, S.N. Weyuker, E.J.: An Extended Domain-Based 
Model of Software Reliability, IEEE Trans. Soft. Eng. 14 (10), 1512-1524, 1988. 


Some Tips on Writing Proof Scores 
in the OTS/CafeOBJ Method 


Kazuhiro Ogata!? and Kokichi Futatsugi? 


1 NEC Software Hokuriku, Ltd. 
ogatak@acm.org 
2 Japan Advanced Institute of Science and Technology (JAIST) 
kokichi@jaist.ac.jp 


Abstract. The OTS/CafeOBJ method is an instance of the proof score 
approach to systems analysis, which has been mainly devoted by re- 
searchers in the OBJ community. We describe some tips on writing proof 
scores in the OTS/CafeOBJ method and use a mutual exclusion proto- 
col to exemplify the tips. We also argue soundness of proof scores in the 
OTS/CafeOBJ method. 


1 Introduction 


The proof score approach to systems analysis has been mainly devoted by re- 
searchers in the OBJ community [10J8]. In the approach, an executable algebraic 
specification language is used to specify systems and system properties, and a 
processor of the language, which has a rewrite engine as one of its functionalities, 
is used as a proof assistant to prove that systems satisfy system properties. Proof 
plans called proof scores are written in the algebraic specification language to 
conduct such proofs and the proof scores are executed by the language processor 
by means of rewriting to check if the proofs are success. 

Proof scores can be regarded as programs to prove that algebraic specifi- 
cations satisfy system properties. While proof scores are being designed, con- 
structed and debugged, we can understand algebraic specifications being ana- 
lyzed more profoundly, which may even let us find flaws lurked in the specifica- 
tions [15]14]. Our thought on proof is similar to that of the designers of LP [II]. 
Proof scripts written in a tactic language provided by proof assistants such as 
Coq [I] and Isabel/HOL [13] may be regarded as such programs, but it seems 
that such proof assistants rather aim for mechanizing mathematics. 

We have argued that the proof score approach to systems analysis is an 
attractive approach to design verification in [6] thanks to (1) balanced human- 
computer interaction and (2) flexible but clear structure of proof scores. The 
former means that humans are able to focus on proof plans, while tedious and 
detailed computations can be left to computers; humans do not necessarily have 
to know what deductive rules or equations should be applied to goals to prove. 
The latter means that lemmas do not need to be proved in advance and proof 
scores can help humans comprehend the corresponding proofs; a proof that a 
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system satisfies a system property can be conducted even when all lemmas used 
have not been proved, and assumptions used are explicitly and clearly written 
in proof scores. To precisely assess the achievement of (1) and (2) in the proof 
score approach and compare it with systems analysis with other existing proof 
assistants, however, we need further studies. 

The OTS/CafeOBJ method [I74]7] is an instance of the proof score approach 
to systems analysis. In the OTS/CafeOBJ method, observational transition sys- 
tems (OTSs) are used as models of systems and CafeOBJ [2], an executable alge- 
braic specification language/system, is used; OTSs are transition systems, which 
are straightforwardly written as algebraic specifications. An older version of the 
OTS/CafeOBJ method is described in [17/4], and the latest version is described 
in [7]. We have conducted case studies, among which are [I5J18/19J16]20}, to 
demonstrate the usefulness of the OTS/CafeOBJ method. In this paper, we de- 
scribe some tips on writing proof scores in the OTS/CafeOBJ method. A mutual 
exclusion protocol called Tlock using atomicInc, which atomically increments the 
number stored in a variable and returns the old number, is used as an example. 
We also argue soundness of proof scores in the OTS/CafeOBJ method. 

The rest of the paper is organized as follows. Section] describes the 
OTS/CafeOBJ method. Section[3] describes tips on writing proof scores in the 
OTS/CafeOBJ method. Section] informally argue soundness of proof scores in 
the OTS/CafeOBJ method. Section] concludes the paper. 


2 The OTS/CafeOBJ Method 


In the OTS/CafeOBJ method, systems are analyzed as follows. 


1. Model a system as an OTS S. 

2. Write S in CafeOBJ as an algebraic specification. The specification consists 
of sorts (or types), operators on the sorts, and equations that define (proper- 
ties of) the operators. The specification can be executed by using equations 
as left-to-right rewrite rules by CafeOBJ. 

3. Write system properties in CafeOBJ. Let P be the set of such system prop- 
erties and let P’ be the empty set. . 

4. If P is empty, the analysis has been successfully finished, which means that 
S satisfies all the properties in P’. Otherwise, extract a property p from P 
and go next. . 

5. Write a proof score in CafeOBJ to prove that S satisfies p. The proof may 
need other system properties as lemmas. Write such system properties in 
CafeOBJ and put them that are not in P’ into P if any. . 

6. Execute (or play) the proof score with CafeOBJ. If all the results are as 
expected, then the proof is discharged. Put p into P’ and go to[4] If all the 
results are not as expected, rewrite the proof score and repeat [6] . 


Tasks [5] and [6] may be interactively conducted together. A counterexample may 
be found in tasks [5] and [6] 
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In this section, we mention CafeOBJ, describe the definitions of basic con- 
cepts on OTSs, write on how to write OTSs in CafeOBJ and how to write proof 
scores that OTSs satisfy invariant properties in CafeOBJ. 


2.1 CafeOBJ 


CafeOBJ [2] is an algebraic specification language/system mainly based on order- 
sorted algebras and hidden algebras [9J3]. Abstract data types are specified in 
terms of order-sorted algebras, and abstract machines are specified in terms of 
hidden algebras. Algebraic specifications of abstract machines are called behav- 
ioral specifications. There are two kinds of sorts in CafeOBJ: visible sorts and 
hidden sorts. A visible sort denotes an abstract data type, while a hidden sort 
denotes the state space of an abstract machine. There are three kinds of opera- 
tors (or operations) with respect to (wrt) hidden sorts: hidden constants, action 
operators and observation operators. Hidden constants denote initial states of ab- 
stract machines, action operators denote state transitions of abstract machines, 
and observation operators let us know the situation where abstract machines are 
located. Both an action operator and an observation operator take a state of an 
abstract machine and zero or more data. The action operator returns the suc- 
cessor state of the state wrt the state transition denoted by the action operator 
plus the data. The observation operator returns a value that characterizes the 
situation where the abstract machine is located. 

Basic units of CafeOBJ specifications are modules. CafeOBJ provides built-in 
modules. One of the most important built-in modules is BOOL in which proposi- 
tional logic is specified. BOOL is automatically imported by almost every module 
unless otherwise stated. In BOOL and its parent modules, declared are the visible 
sort Bool, the constants true and false of Bool, and operators denoting some 
basic logical connectives. Among the operators are not_, _and_, _or_, _xor_, 
_implies_ and _iff_ denoting negation (~), conjunction (A), disjunction (V), 
exclusive disjunction (xor), implication (=) and logical equivalence (<=), re- 
spectively. The operator if _then_else_fi corresponding to the if construct in 
programming languages is also declared. CafeOBJ uses the Hsiang term rewrit- 
ing system (TRS) as the decision procedure for propositional logic, which is 
implemented in BOOL. CafeOBJ reduces any term denoting a proposition that is 
always true (false) to true (false). More generally, a term denoting a proposi- 
tion reduces to an exclusively disjunctive normal form of the proposition. 


2.2 Observational Transition Systems (OTSs) 


We suppose that there exists a universal state space denoted Y and that each 
data type used in OTSs is provided. The data types include Bool for truth values. 
A data type is denoted D». 


Definition 1 (OTSs). An OTS S is (O,T,T) such that 


— O: A finite set of observers. Each observer 024:D51,...,2m:Dom : T > Do 
is an indexed function that has m indexes 21,...,%m whose types are 
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Doi,..., Dom. The equivalence relation (vı =s v2) between two states 
U1, 02 E Y is defined as Yos... æm : O. (Oxy ,...:0m (V1) = On4,...,2m(V2)), where 
sosm + O ts the abbreviation of Yori. em : Oai : Dor- Yam : Dom- 
— T: The set of initial states such that T CTY. 

— T : A finite set of transitions. Each transition ty,:Di1,...,yn:Din : T > Y is an 
indexed function that has n indexes y1,...,Yn whose types are Dy, ..., Din 
provided that ty, ,... yn (V1) =s tyy,...,yn(V2) for each [v] € T/=s, each vi, v2 € 
[u] and each yp : Dik for k = 1,..., n. ty... yn (V) is called the successor state 
ofv wrt S. Each transition ty,,....y, has the condition c-ty,:Dy1,...,yn:Din : T > 
Bool, which is called the effective condition of the transition. If c-ty;,... yn (V) 
does not hold, then ty... yn (V) =s v. 


pete 














We note the following two points on transitions, which have something to 
do with writing proof scores. (1) Although transitions are defined as relations 
among states in some other existing transition systems, transitions are func- 
tions on states in OTSs. This is because transitions are represented by (action) 
operators in behavioral specifications and operators are functions in CafeOBJ. 
However, multiple transitions that are functions on states can be substituted for 
one transition that is a relation among states. (2) Basically there is no restric- 
tion on the form of effective conditions. But, effective conditions should be in 
the form c¢1-ty,,...,y,(U) A... A Car-tyy,..., yn (V) has no 
logical connectives or has one negation at head, so that proof scores can have 
clear structure. When an effective condition is not in this form, it is converted to 
a disjunctive normal form. If the disjunctive normal form has more than one dis- 
junct, multiple transitions each of which has one of the disjuncts as its effective 
condition can be substituted for the corresponding transition. 


ya's) 


Definition 2 (Reachable states). Given an OTS S, reachable states wrt S 
are inductively defined: 


— Each vinit € T is reachable wrt S. 
— For each ty,,...y, E T and each yp : Dix for k = 1....,n, tay,...,0,(v) is 
reachable wrt S ifv € Y is reachable wrt S. 














Let Rs be the set of all reachable states wrt S . 


Predicates whose types are Y — Bool are called state predicates. All proper- 
ties considered in this paper are invariants. 


Definition 3 (Invariants). Any state predicate p : Y — Bool is called invari- 
ant wrt S if p holds in all reachable states wrt S, i.e. Vu: Rg. p(v). 














We suppose that each state predicate p considered in this paper has the form 
Vzı : Dp...VZa : Dpa. P(v, 21, ..-, Za), Where v, 21,...,2¢ are all variables in p 
and P(v,21,...,Za) does not contain any quantifiers. 

A concrete example of how to model a system as an OTS is given. 


Example 1 (Tlock). The pseudo-code executed by each process i can be written 
as follows: 
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Loop 
11: ticket[i] := atomicInc(tvm); 
12: repeat until ticket[i] = turn; 
Critical section; 
cs: turn := turn + 1; 


tvm and turn are non-negative integer variables shared by all processes and 
ticket|i] is a non-negative integer variable that is local to process i. Initially, 
each process i is at label 11, tvm and turn are 0, and ticket|i| for each i is 
unspecified. The value of tvm (which stands for a ticket vending machine) is the 
next available ticket. Each process i obtains a ticket, which is stored in ticket/?], 
at label 11. A process is allowed to enter the critical section if its ticket equals the 
value of turn at label 12. turn is incremented when a process leaves the critical 
section at label cs. 

Let Label, Pid and Nat be the types of labels (11, 12 and cs), process IDs 
and non-negative integers (natural numbers). Tlock can be modeled as the OTS 
StTiock Such that 


— Onto Ê {tvm : Y — Nat,turn : Y > Nat, ticketi.pia : Y > Nat, pc;.piq : Y > 
Label} 

— Trox £ {vinit € Y | tvm(vinit) = 0 A turn (vinit) = 0 A Vi : Pid. (pc; (Vinit) = 11)} 

— TTlock £ {get;.pia : Y — Y, check;:Pia :Y — Y, exit;:Pid : T —> r} 


The three transitions are defined as follows: 


— get; : c-get;(v) £ pc; (v) = 11. If c-want; (v), then 


i(v) 
tvm(get,(v)) £ tvm(v) + 1, turn(get;(v)) £ turn(v), 
ticket; (get; (v N= Sif i = j tvm(v) elseticket;(v), and 


pe, (get; (v v)) £ if i = j then 12 else pc; (v). 
— check; : c-check;(v) £ pe;(v) = 12 A ticket; (v) = turn(v). If c-want; (v), then 
tvm(check;(v)) £ tvm(v), turn(check;(v)) £ turn(v), 
ticket; (get;(v)) Ê ticket;(v), and pe, (check;(v)) £ if i = j then cs else pc; (v). 
— exit; : c-exit;(v) = pce,(v) = cs. If c-wanti(v), then 
tvm(exit;(v)) £ tvm(v) + 1, turn(exit;(v)) £ turn(v), 
ticket; (get,(v)) £ ticket;(v), and pc, (exit; (v)) £ if i = j then 11 else pe, (v). 
Let MX(v) be Vi,j : Pid. [(pei(v) = es A pe;(v) = cs) > i = j]. MX(v) 
is invariant wrt Stiock, ie. VU : Reso... MX(v), although it may need to be 
verified. 














2.3 Specifying OTSs in CafeOBJ 


We suppose that a visible sort V, corresponding to each data type D, used in 
OTSs and the related operators are provided. X and Y; are CafeOBJ variables 
corresponding to indexes x, and yx of observers and transitions, respectively. 


Some Tips on Writing Proof Scores in the OTS/CafeOBJ Method 601 


The universal state space Y is represented by a hidden sort, say H declared 
as *[H]* by enclosing it with *[ and ]*. Given an OTS S, an arbitrary initial 
state is represented by a hidden constant, say init, each observer 02, |... æm 
is represented by an observation operator, say o, and each transition ty;,... yn 
is represented by an action operator, say t. The hidden constant init, the 
observation operator o and the action operator t are declared as follows: 


op init :-> H 
bop o :H Voi ...Vom -> Vo 
bop t :H Va ...Vin -> H 


The keyword bop or bops is used to declare observation and action operators. 
We suppose that the value returned by oz,,....x,, in an arbitrary initial state 
can be expressed as f(x1,..., £m). This is expressed by the following equation: 


eq o(init,X,,...,X¥m) = £(%1,...,Xm) 


£(X1,..-.,Xm) is the CafeOBJ term corresponding to f(#1,...,2m). 

Each transition ty,,....y, is defined by describing what the value returned by 
each observer 0z,,...,.2, in the successor state becomes when ty, ,....y,, is applied in 
a state v. When cty, y,,(v) holds, this is expressed generally by a conditional 
equation that has the form 


ceq o(t(S,Y1,...,Yn),X1,---,X¥m) = e-t(S,¥Y1,.--,Yn,X1,---,Xm) 
if c-t(8,Y1,...,Yn) 


S is a CafeOBJ variable of H, corresponding to v. e-t(S,Y1,..-,Yn,X1,---,Xm) 
is the CafeOBJ term corresponding to the value returned by o,,,....,, in the 
successor state denoted by t(S,Y1,...,Yn). c-t(S,Y1,...,Y,) is the CafeOBJ 
term corresponding to c-ty, ,....y, (V). 

If cty... yn (V) always holds in any state v or the value returned by 02, ,... am 
is not affected by applying t,,,...y, in any state v (i.e. regardless of the truth 
value of c-ty,,...y,(v)), then a usual equation is used instead of a conditional 
equation. The usual equation has the form 


eq o(t(S,Y1,...,;Yn),X1,---,Xm) = e-t(S,¥1,...,;Yn,X1,---,Xm) 


e-t(S,Y1,...,Yn,X1,.--,Xm) is S if the value returned by oz,,..2,, is not 
affected by applying ty, ,....y,, in any state. 


n 


pressed by a conditional equation that has the form 
ceq t(S,Y1,...,Yn) = S if not c-t(S,Yj,...,Yn) 
We give the CafeOBJ specification of Stock. 


Example 2 (CafeOBJ specification of Stiock). SQlock is specified in CafeOBJ as 
the module TLOCK: 
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mod* TLOCK { pr(PNAT) pr(LABEL) pr(PID) 


* [Sys] * 
-- an arbitrary initial state 
op init : -> Sys 


-- observation operators 
bops tvm turn : Sys -> Nat bop ticket : Sys Pid -> Nat 
bop pc : Sys Pid -> Label 
-- action operators 
bops get check exit : Sys Pid -> Sys 
-- CafeOBJ variables 
var S : Sys vars I J: Pid 
-- init 
eq tvm(init) = 0. eq turn(init) = 0 . eq pe(init,I) =11. 
-- get 
op c-get : Sys Pid -> Bool {strat: (0 1 2)} 
eq c-get(S,I) = (pc(S,I) = 11) 
ceq tvm(get(S,I)) = s(tvm(S)) if c-get(S,I) 
eq turn(get(S,I)) = turn(S) 
ceq ticket (get(S,I),J) 
= (if I = J then tvm(S) else ticket(S,J) fi) if c-get(S,I) 
ceq pc(get(S,I),J) = (if I = J then 12 else pc(S,J) fi) if c-get(S,I) 
ceq get(S,I) = S if not c-get(S,I) 
-- check 
op c-check : Sys Pid -> Bool {strat: (0 1 2)} 
eq c-check(S,I) = (pc(S,I) = 12 and ticket(S,I) = turn(S)) 
eq tvm(check(S,I)) = tvm(S) . eq turn(check(S,I)) = turn(S) 
eq ticket(check(S,I),J) = ticket(S,J) 
ceq pc(check(S,I),J) 
= (if I = J then cs else pc(S,J) fi) if c-check(S,I) 
ceq check(S,I) = S if not c-check(S,I) 
-- exit 
op c-exit : Sys Pid -> Bool {strat: (0 1 2)} 
eq c-exit(S,I) = (pc(S,I) = cs) 
eq tvm(exit(S,I)) = tvm(S) 
ceq turn(exit(S,I)) = s(turn(S)) if c-exit(S,I) 
eq ticket(exit(S,I),J) = ticket(S,J) 
ceq pc(exit(S,I),J) 
= (if I = J then 11 else pc(S,J) fi) if c-exit(S,I) 
ceq exit(S,I) = S if not c-exit(S,I) 
} 


A comment starts with -- and terminates at the end of the line. PNAT, LABEL 
and PID are the modules in which natural numbers, labels and process IDs 
are specified. The keyword pr is used to imports modules. The operator s of 
s(tvm(S)) and s(turn(S)) is the successor function of natural numbers. The 
keyword start: is used to specify local strategies to operators [5]. The local 
strategy (0 1 2) given to c-get indicates that when CafeOBJ meets a term 
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whose top is c-get such as c-get(s,i), CafeOBJ should try to rewrite the 
whole term such as c-get(s,7). If CafeOBJ does not find any rules with which 
the term is rewritten, it evaluates the first and second arguments such as s and 
i in that order, and tries to rewrite the whole term such as c-get(s’,i’) again, 
where s’ and 7’ are the results obtained by evaluating s and i. 














2.4 Proof Scores of Invariants 


Although some invariants may be proved by rewriting and/or case splitting only, 
we often need to use induction, especially simultaneous induction [7]. We then 
describe how to verify Vu : Rs.p(v) by simultaneous induction by writing proof 
scores in CafeOBJ based on the CafeOBJ specification of S. 

It is often impossible to prove Vu : Rs. p(v) alone. We then suppose that it is 
possible to prove Vu : Rs. p(v) together with N — 1 other state predicated, that 
is, we prove Vu: Rs. (pi(v) A... \pn(v)), where pı is p. We suppose that each 
pr has the form Vzę : Dpk. Pe(v, zk) for k = 1,..., N. Note that the method 
described here can be used when p; has more than one universally quantified 
variable. Let v§,;, be an arbitrary initial state of S, and then for the base case, 
all we have to do is to prove 


Vzı : Doi. Pi (Unito 21) A- A Ven : Don. Pn (Upnit: ZN) (1) 
For each induction case (i.e. each ty,,..y,, € J), all we have to do is to prove 


Vzı : Dp. Pi(v’, 21) A... AVen : Dow. Pr (v®, zy) (2) 
=> Vz1: Doi. Pi (tye,...,ye (v°), z1) A... A Yzy : Dpr- Py (tyg,....y (v°), zn) 


c 
n 


for an arbitrary state v° and an arbitrary value y; for k = 1,...,n. 
To prove (i), we can separately prove each conjunct 
P; (Vhit zk) (3) 
where z¢ is an arbitrary value of D,, for k = 1,..., N. To prove (2), assuming 
V21 : Dp. Pi(v%, 21), ..., Vzw : Don. Pn(v°, zn), we can separately prove each 
Px(tyg,...,yg(U°), 2%), where zy is an arbitrary value of Dpp, for k = 1,...,N. 


Px(v°, zz) is often used as an assumption to prove Px (tye,....ye (V°), zg). Therefore, 
the formula to prove has the form 


(Pa(v®, da) A PoS, do) A...) = [Pr(u®, zk) > Pe(tyg,..ue(v%), zR] (4) 


where a, 3,... € {1,...,N} and d,,dg,... are some values of Dpa, Dyg,... for 
i=1,...,N. 

We next describe how to write proof plans of (8) and (4) in CafeOBJ. We 
first declare the operators denoting P,,..., Px and the equations defining the 
operators. The operators and equations are declared in a module, say INV (which 
imports the module where S is written), as follows: 


3 Generally, such N — 1 state predicates should be found while Vu : Rs. p(v) is being 
proved. 
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op inv, : H Vp -> Bool 
eq inv,(S,Zp) = Pi (S,Zx) 


fork = 1,..., N. Zp is a CafeOBJ variable of Vp, and P; (S , Zk) isa CafeOBJ term 
denoting P,(v, z,). In INV, we also declare a constant zf denoting an arbitrary 
value of Vpx for i = 1,...,.N. We then declare the operators denoting basic 
formulas to prove in the induction cases and the equations defining the operators. 
The operators and equations are declared in a module, say ISTEP (which imports 
INV), as follows: 
op istepk : Vpk -> Bool 
eq istep,(Z,) = invz,(s,Z,) implies inv;(s’ ,Zz) 
for i = 1,...,N.s and s’, which are declared in ISTEP, are constants of H. s 
denotes an arbitrary state and s’ denotes a successor state of the state. 

The proof plan of (3), written in CafeOBJ, has the form 


open INV 
red inv; (init ,zz) 
close 
fori = 1,...,N. The command open makes a temporary module that imports 


a given module and the command close destroys it. The command red reduces 
a given term. CafeOBJ scripts like this constitute proof scores. Such fragments 
of proof scores are called proof passages. Feeding such a proof passage into the 
CafeOBJ system, if the CafeOBJ system returns true, the corresponding proof 
is successfully done. 

The proof of (4) often needs case splitting. We suppose that the state space 
is split into Lk sub-spaced4 in order to prove (4) and that each sub-space is 
characterized by a proposition case,; for l = 1,..., Lk provided that caseķı V 
... V casexp,. The proof of (4) can be then replaced with 


casex, => (5) 
[(Pa(v°, da) A Pa(v°,dg) A...) => [Pr(v®, 2g) > Pr(tys,....ye(v%), 2%) I] 


for l= 1,..., Lp andk=1,...,N. 
We suppose that da, dg, ... are CafeOBJ terms denoting da, dg, ... Then the 
proof passage of (5) has the form 


open ISTEP 
-- arbitrary objects 
op yy : -> Vi. +++ op yy : > VWN. 
-- assumptions 
Declaration of equations denoting casex,. 
-- successor state 
eq s’ = t(s,y{,---,vN) 
-- check 
red (invg(s,dq) and invg(s,dg) and ...) implies istep;(z;) 
close 
for l= 1,..., Lp andk=1,...,N. 


* Generally, such case splitting should be done while Vu : Rs. p(v) is being proved. 
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Equations available in a proof passage “open M --- close” are those declared 
in the module M and the modules imported by M plus those declared in the 
proof passage. We say that the lefthand side of an equation | = r (a term t) is 
(ir)reducible in a proof passage if | (t) is (ir)reducible wrt E \ {l = r} (E), where 
E is the set of all equations available in the proof passage. 

We briefly describe the proof scores of Vu : Rsm: MX (v). 


Example 3 (Proof socres of Vv : RSs: MX(v)). We need four more state pred- 
icates to prove Vu : Rsy,.,- MX(v), which are found while proving it. The four 
state predicates are as follows: po(v) £ Vi,j : Pid. [(pce;(v) = cs A pe;(v) = 
12 A ticket;(v) = turn(v)) > i = j], ps(v) = Vi : Pid. (pe;(v) cs > 
turn(v) < tvm(v)), pa(v) £ Vi, j : Pid. [(pc;(v) = 12 A pc; (v) = 12 A ticket;(v) = 
ticket;(v) > i = j], and ps(v) = Vi: Pid. (pc; (v) = 12 > ticket;(v) < tvm(v)). 
The proof of Vu : RSs: MX (V) needs po, that of Vu : Rs... P2(V) needs MX, 
p3 and p4, that of Vu : RSs: Pp3(V) needs MX and ps, that of Vu : Rea... Pa (V) 
needs ps, and that of Vu : RSie: Ps(V) needs no other state predicates. 
The module INV is declared as follows: 


mod INV { pr(TLOCK) 


ops ij : -> Pid 

op inv1 : Sys Pid Pid -> Bool op inv2 : Sys Pid Pid -> Bool 

op inv3 : Sys Pid -> Bool op inv4 : Sys Pid Pid -> Bool 

op inv5 : Sys Pid -> Bool 

var S : Sys vars I J : Pid 

eq invi(S,I,J) = ((pc(S,I) = cs and pc(S,J) = cs) implies I = J) 
eq inv2(S,I,J) = ((pc(S,I) = cs and pc(S,J) = 12 


and ticket(S,J) = turn(S)) implies I = J) . 
(pce(S,I) = cs implies turn(S) < tvm(S)) 
((pc(S,I) = 12 and pc(S,J) = 12 

and ticket(S,I) = ticket(S,J)) implies I = J) 


(pc(S,I) = 12 implies ticket(S,I) < tvm(S)) 


eq inv3(S,I) 
eq inv4(S,I,J) 


eq inv5(S,1) 
} 


The module ISTEP is declared as follows: 


mod ISTEP { pr(INV) 


ops s s? : -> Sys 

op istep1 : Pid Pid -> Bool op istep2 : Pid Pid -> Bool 
op istep3 : Pid -> Bool op istep4 : Pid Pid -> Bool 
op istep5 : Pid -> Bool 

vars I J : Pid 

eq istepi(I,J) = invi(s,I,J) implies invi(s’,I,J) 

eq istep2(I,J) = inv2(s,I,J) implies inv2(s’,I,J) 

eq istep3(I) = inv3(s,I) implies inv3(s’,I) 

eq istep4(I,J) = inv4(s,I,J) implies inv4(s’,I,J) 

eq istep5(I) = inv5(s,I) implies inv5(s’,I) 
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Let us consider the following proof passage of Vu : Rsm: MX(v): 


open ISTEP 
-- arbitrary values 
op k : -> Pid . 
-- assumptions 
-- eq c-check(s,k) = true . 
eq pc(s,k) = 12 . eq ticket(s,k) = turn(s) 
qqi=k. eq (j = k) = false. eq pc(s,j) =cs . 
-- successor state 
eq s’ = check(s,k) 
-- check 
red istep1(i,j) 
close 


The proof passage corresponds to a (sub-)case obtained by splitting the induc- 
tion case for check,. The (sub-)case is referred as case 1.check.1.1.0.1. CafeOBJ 
returns false for the proof passage. From the five equations that character- 
ize the (sub-)case, however, we can conjecture po. When inv2(s,j,i) implies 
istep1(i,j) is used instead of istep1(i,j), CafeOBJ returns true for the 
proof passage. 

Let us consider the following proof passage of Vu : Rsm...-D2(v): 


open ISTEP 
-- arbitrary values 
op k : -> Pid . 
-- assumptions 
-- eq c-exit(s,k) = true . 
eq pce(s,k) = cs . 
eq (i = k) = false . eq (j = k) = false . eq pc(s,i) =cs. 
-- successor state 
eq s? = exit(s,k) 
-- check 
red istep2(i,j) 
close 


The proof passage corresponds to a (sub-)case obtained by splitting the in- 
duction case for exit. The (sub-)case is referred as case 2.exit.1.0.0.1. Al 
though CafeOBJ returns neither true nor false for the proof passage, we 
notice that invi(s,i,k) reduces to false in the proof passage. Therefore, 
we use invi(s,i,k) implies istep2(i,j) instead of istep2(i,j) and then 
CafeOBJ returns true for the proof passage. 














3 Tips 


What we should do to prove a state predicate invariant wrt an OTS is three 
tasks: (1) use of simultaneous induction, (2) case splitting and (3) predicate 
(lemma) discovery/use. We use the proof of Vu : RSs: MX(v) to describe the 
three tasks. 


Some Tips on Writing Proof Scores in the OTS/CafeOBJ Method 607 


3.1 Simultaneous Induction 


The first thing to do is to use simultaneous induction to break the proof into 
the four (sub-)goals (one is the base case and the others are the three induction 
cases) and the four proof passages are written. The proof passage of the base 
case is as follows: 


open INV 
red invi(init,i,j) 
close 


The proof passage of the induction case for check, is as follows: 


open ISTEP 
op k : -> Pid . 
eq s’ = check(s,k) 
red istep1(i,j) 
close 


The case is referred as case l.check. The proof passages of the remaining two 
induction cases are written likewise. 

CafeOBJ returns true for the base case but neither true nor false for each 
of the three induction cases. What to do for the three induction cases are case 
splitting and/or predicate discovery /use. 


3.2 First Thing to Do for Each Induction Case 


Each induction case for ty... y, is split into two (sub-)cases: (1) c-ty,,....y,, and 
(2) nc-ty,,...,.y, unless Cty... yn holds in every case. Case 1.check is split into the 
two (sub-)cases whose corresponding proof passages are as follows: 


open ISTEP open ISTEP 
op k : -> Pid . op k : -> Pid . 
eq c-check(s,k) = true . eq c-check(s,k) = false . 
eq s?’ = check(s,k) . eq s’ = check(s,k) 
red istepl(i,j) . red istep1 (i,j) 
close close 


The two (sub-)cases are referred as case 1.check.1 and 1.check.0. CafeOBJ re- 
turns true for case 1.check.0 but neither true nor false for case 1.check.1. 
CafeOBJ always returns true for the (sub-)case where =c-ty;,... yn due to Defi- 
nition{iJif the OTS concerned is correctly written in CafeOBJ. 


3.3 Appropriate Equations Declared in Proof Passages 


As shown, each (sub-)case is characterized by equations. Equational reasoning by 
rewriting is used to check if a proposition holds in each case, but full equational 
reasoning power is not used because CafeOBJ does not employ any completion 
facilities. Therefore, equations that characterize a case heavily affects the success 
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in proving that a proposition holds in the case. We describe appropriate equa- 
tions, which characterize a case, declared in a proof passage. If CafeOBJ returns 
true for a proof passage, nothing should be done. Otherwise, the equations in 
the proof passage should be appropriate as described from now. 


— The lefthand side of each equation should be irreducible in a proof passage so 
that the equation can be used effectively as a rewrite rule. This is because the 
rewriting strategy adopted by CafeOBJ is basically an innermost strategy. 

— Let PP(E), where E is a set of equations, be a proof passage in which 
the equations in E are declared, and FE, and Ea be sets of equations. We 
suppose that Ae eg, €1 is equivalent to Ae ep, €2. If every equation in Fy 
can be proved by rewriting from PP(E2) but every equation in E> cannot 
be proved by rewriting from PP(E,), then E2 should be used instead of £1. 
Some examples are given. 

1. Let Fy be {p1 A p2 = true} and E> be {pı = true, p2 = true}. We 
suppose that pı A p2, pı and pz are irreducible in PP(Q). Then, pi A p2 
reduces to true in PP(E2) but lı (l2) does not necessarily reduce to 
true in PP(E1). Therefore, Ey should be used instead of E1. 

2. Let c be a binary data constructor. We suppose that c(a1,b1) equals 
c(a2, b2) if and only if ay equals az and bı equals bz. Let EF; be {c(a1, b1) = 
c(a2, be)} and Ey be {a1 = az2,b1 = b2}. We suppose that c(ai,b1), ai 
and b; are irreducible in PP(@). Then, both c(a1, b1) and c(az, b2) reduce 
to a same term in PP(E2) but a; and az (bı and b2) do not necessarily 
reduce to a same term in PP(E1). Therefore, E> should be used instead 
of Ey A 

3. Let n be a natural number, N be a constant denoting an arbitrary 
multiset of natural numbers, the juxtaposition operator be a data 
constructor of multisets. The juxtaposition operator is declared as 
op _. : Bag Bag -> Bag {assoc comm id: empty}, where Bag is the 
visible sort for multisets of natural numbers and is a supersort of Nat, 
assoc and comm specify that the operator is associative and commuta- 
tive, and id: empty specifies that empty, which is the constant denoting 
the empty multiset, is an identity of the operator. We suppose that we 
want to specify that N includes n. One way is to use n € N = true, and 
the other way is to use N = n N’, where N’ is another constant denoting 
an arbitrary multiset of natural numberg). Let Eı be {n € N = true} 
and Ez be {N = n N’}. We suppose that n € N and N are irreducible 
in PP(0). Then, n € N reduces to true in PP(E2) if € is defined ap- 
propriately in equation, but N and n N’ do not necessarily reduce to a 
same term in PP(F,). Therefore, E> should be used instead of Æ. 

4. wp is reducible in any proof passage because of the Hsiang TRS. If p 
is irreducible in a proof passage, ~p reduces to p xor true in the proof 


5 Since N is an arbitrary multiset and includes n, N must be n’ N’, where (1) n’ 


equals n or (2) n € N’. We can select (1) because the juxtaposition operator is 
associative and commutative. 
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passage. Therefore, one way of making the equation (=p) = true effective 
is to use (p xor true) = true. But, p = false is more appropriate. 

5. This example is a variant. Let Æ be {(1 = r) = true} and Fə be {l = r}. 
We suppose that l = r and l are irreducible in PP(@). | = r reduces to 
true in both PP(E1) and PP(£2). It is often the case, however, that E2 
is more appropriate than Æ; because l reduces r in PP(F2) but l does 
not in PP(E1). 


According to what has been described in this subsection, the proof passage 
of case 1.check.1 should be rewritten as follows: 


open ISTEP 
op k : -> Pid . 
-- eq c-check(s,k) = true . 
eq pc(s,k) = 12 . eq ticket(s,k) = turn(s) 
eq s’ = check(s,k) 
red istep1(i,j) 
close 


CafeOBJ still returns neither true nor false for this proof passage. Then, what 
we should do is further case splitting. 


3.4 Further Case Splitting 


For a proof passage for which CafeOBJ returns neither true nor false, the case 
corresponding to the proof passage is split into multiple (sub-)cases in each of 
which CafeOBJ returns either true or false. When CafeOBJ returns true in a 
(sub-)case, nothing should be done for the case. When CafeOBJ returns false 
in a (sub-)case, it is necessary to find a state predicate that does not hold in the 
case and is likely invariant wrt an OTS concerned. 

There are some ways of splitting a case into multiple (sub-)cases. 


— Based on a proposition p: A case is split into two (sub-)cases where (1) p 
holds and (2) p does not, respectively. As shown in Subsect.B.2] case 1.check 
is split into the two (sub-)cases based on the proposition c-check(s,k). 

— Based on data constructors: We suppose that a data type has M data con- 
structors. Then, a case is split into M (sub-)cases. Some examples are given. 

1. Nat has the two data constructors 0 and s. Let x be a constant denoting 
an arbitrary natural number in a proof passage. The case corresponding 
to the proof passage is split into the two (sub-)cases where (1) z = 0 and 
(2) x = s(y), where y is another constant denoting an arbitrary natural 
number. Case (1) means that x is zero and case (2) means that x is not 
Zero. 

2. Bag has the two data constructors empty and __. Let N be a constant 
denoting an arbitrary multiset in a proof passage. The case corresponding 
to the proof passage is split into the two (sub-)cases where (1) N = empty 
and (2) N =n’ N’, where n’ is a constant denoting an arbitrary natural 
number and N’ is a constant denoting an arbitrary multiset. Case (1) 
means that N is empty and case (2) means that N is not empty. 
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— Based on a tautology whose form is p1 V. ..Vpm : A case is split into M (sub-) 
cases where (1) pı holds, ..., (M) pm holds. This case splitting generalizes 
the case splitting based on a proposition because p V =p is a tautology. 


In order to apply one of the three ways of splitting a case, we need to find a 
proposition, a constant denoting an arbitrary value of a data type, or a tautology 
whose form is 9, V...V py. There are usually multiple candidates based on which 
a case is split. A selection from such candidates affects how well a proof concerned 
is conducted. It is necessary to understand an OTS concerned and experience 
writing proof scores so as to select a better one among such candidates. There 
are some heuristic rules, however, to select one among such candidates. 


— Select a proposition that directly affects the truth value of a proposition to 
prove such as istep(i,j).Ifi equals j, istep(i, j) reduces to true in case 
1.check.1, the proposition i = j may be a good candidate. 

— Select a proposition p if p appears in a result obtained by reducing a propo- 
sition to prove. If p appears at the conditional position of if_then_else_fi 
such as if pthenaelsebfi, p may be a good candidate. 


We describe how to split case 1.check.1. CafeOBJ returns ((if (k = i) 
then cs else pc(s,i) fi) = cs) and ... for the corresponding proof pas- 
sage. Then, we select the proposition k = i to split the case. The equation i = 
k is declared) in one proof passage whose corresponding case is referred as case 
1.check.1.1, and the equation (i = k) = false is declared in the other proof 
passage whose corresponding case is referred as case 1.check.1.0. 

Since CafeOBJ returns if (k = j) then cs else pc(s,j) fi = cs and 
... for the proof passage corresponding case 1.check.1.1, we select the propo- 
sition k = j to split the case. The equation j = k is declared in one proof 
passage whose corresponding case is referred as 1.check.1.1.1, and the equation 
(j = k) = false is declared in the other proof passage whose corresponding 
case is referred as case 1.check.1.1.0. CafeOBJ returns true for the former proof 
passage, but pc(s,j) = cs xor true for the latter proof passage. Then, case 
l.check.1.1.0 is also split based on pc(s,j) = cs. The equation pc(s,j) = cs 
is declared in one proof passage whose corresponding case is referred as case 
1.check.1.1.0.1, and the equation (pc(s,j) = cs) = false is declared in the 
other proof passage whose corresponding case is referred as case 1.check.1.1.0.0. 
CafeOBJ returns false for the former proof passage and true for the latter 
proof passage. Case 1.check.1.0 can be split into four (sub-)cases in the same 
was as case l.check1.1. 


3.5 Predicate (Lemma) Discovery /Use 


When CafeOBJ returns false for a proof passage, there are two possibilities: 
(1) if an an arbitrary state characterized by the case corresponding to the proof 
passage is not reachable wrt an OTS S concerned, the case can be discharged, 


6 Note that i = k is declared instead for k = i. 
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and (2) otherwise, a state predicate concerned is not invariant wrt S. If a state 
predicate is invariant wrt S and does not hold in the case, then an arbitrary 
state characterized by the case is not reachable wrt S. That is why we find a 
state predicate that does not hold in the case and is likely invariant wrt S. 

Let F is a set of equations that characterize a case such that CafeOBJ re- 
turns false for a proof passage corresponding to the case. We suppose that 
Necge is equivalent to a proposition whose form is Q(v°, 26). Let q(v) be 
Via : Dga-7Q(Vv, Za). Since q surely does not hold in the case characterized by 
E, q is one possible candidate. Generally, q’ such that q’ = q can be a candidate 
because q’ does not hold in the case characterized by E, 

Let us consider the proof passage corresponding to case 1.check.1.1.0.1 
shown in Example[3] From the five equations that characterize the case, we 
obtain the proposition pc(s,i) = 12 and pc(s,j) = cs and ticket(s,i) = 
turn(s) and not(j = i) by concatenating them with conjunctions, substitut- 
ing k with i because of the equation i = k, and deleting the tautology i = i. 
p2 is obtained from the proposition, 

Some contradiction may be found in a set of equations that characterize a 
case even when CafeOBJ does not return false in a proof passage corresponding 
to the case. If that is the case, a state predicate can be obtained from the 
contradiction such that the state predicate does not hold in the case and is 
likely invariant wrt an OTS concerned. 

Let us consider the proof passage corresponding to case 2.exit.1.0.0.1 shown 
in ExampleB] We notice that the three equations pc(s,k) = cs, pc(s,i) = cs 
and (i = k) = false contradict Vu : Rs... MX(v) and invi(s,i,k) can be 
used in the proof passage. 

Even when any contradictions are not found in a set of equations that charac- 
terize a case and CafeOBJ does not return false in a proof passage correspond- 
ing to the case, a state predicate may be found such that the state predicate can 
be used to discharge the case and is likely invariant wrt an OTS concerned. 

Let us consider the proof passage corresponding to case 1.check.1.1.0. 
CafeOBJ returns pc(s,j) = cs xor true for the proof passage, but inv2(s,j, 
i) also reduces to pc(s,j) = cs xor true in the proof passage. Therefore, 
inv2(s,j,i) can be used to discharge the case and it is not necessary to split 
the case anymore. 


4 Soundness of Proof Scores 


Let us consider the proof of Vu : Rs. (pi(v) A... A pw(v)) described in Sub- 
sect.[2.4] again. If CafeOBJ returns true for each proof passage in the proof 
scores, p1,..., py are really invariant wrt S provided that 


1. Needless to say, the computer (including the operating system, the hardware, 
etc.) on which CafeOBJ works is reliable, 

2. Equational reasoning is sound and rewriting faithfully (partially though) im- 
plements equational reasoning [8]; the CafeOBJ implementation of rewriting 
is reliable, 
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The Hsiang TRS is sound [12]; the TRS is reliably implemented in CafeOBJ, 
The built-in equality operator _==_ is not used, 

S is specified in CafeOBJ in the way described in Subsect.[2.3] and 

The proof scores of Vu : Rs.(pi(v) A... A pn(v)) are written in the way 
described in Subsect.[2.4] 


Fa E ao 


When CafeOBJ meets the term a == b, it first reduces a and b to a’ and b’, which 
are irreducible wrt a set of equations (rewrite rules) concerned, and returns true 


if a’ is exactly the same as b’ and false otherwise. The combination of _==_ and 
not_ can damage the soundness. Since the built-in inequality operator _=/=_ is 
the combination of _==_ and not_, it should not be used either. Let us consider 


the following module: 


mod! DATA { [Data] 
ops di d2 : -> Data 
} 


We try to prove Vd : Data. a=(d = d2) by writing a proof score. A plausible proof 
score that consists of one proof passage is as follows: 


open DATA 
op d : -> Data . -- an arbitrary value of Data. 
red not(d == d2) . -- or red d =/= d2 . 

close 


CafeOBJ returns true for this proof passage, which contradicts the fact that 
there exists the counterexample d2. Therefore, users should declare an equality 
operator such as _=_ for each visible sort and equations defining it instead of 
-==_ and _=/=_. 

Under the above six assumptions, the only thing that we should take care of 
on the soundness is whether all necessary cases are checked by rewriting for each 
proof passage. A possible source of damaging it is transitions. Since transitions 
are functions on states in OTSs, however, the source can be dismissed. Every 
operator is a function in CafeOBJ as well. Therefore, rewriting surely covers all 
necessary cases for each proof passage. 

Note that we do not have to assume that the CafeOBJ specification of S, 
when it is regarded as a TRS, is terminating or confluent for the soundness. 
If the CafeOBJ specification is not terminating, CafeOBJ may not return any 
results for a proof passage forever. This causes the success in proofs, but does 
not affect the soundness. 

We suppose that a term a has two irreducible forms a’ and a” in a proof 
passage because the CafeOBJ specification is not confluent and that a actually 
reduces to a’ but not to a”. Although CafeOBJ ignores a rewriting sequence 
that starts with a and ends in a”, this does not affect the soundness because 
a’ equals a” from an equational reasoning point of view and it is enough to use 
either a’ or a”. Whether the CafeOBJ specification is confluent, however, can 
affects the success in proofs. Let us consider the following module: 
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mod! DATA2 { [Data2] 


} 


ops di d2 d3 : -> Data2 


op _=_ : Data2 Data2 -> Bool {comm} 
var D : Data2 
eq (D = D) = true . 


eq di = d2 . eq di = d3 . 


We try to prove d1 = d3 by writing a proof passage. The case is split into two 
(sub-)cases where (1) d2 = d3 and (2) d2 # d3. Then, the proof score that 
consists of two proof passages is as follows: 


open DATA2 
eq d2 = d3 . 
red di = d3 . 
close 
open DATA2 
eq (d2 = d3) = false . 
red di = d3 . 
close 


CafeOBJ returns true for the first proof passage and false for the second proof 
passage. We stuck for the second proof passage unless we notice the equation d1 
= d3 in the module DATA2. 


From what has been described, it is desirable that the CafeOBJ specification 


of S is terminating and confluent. 


We can check if proof scores that state predicates are invariant wrt S con- 


forms to what is described in Subsect.[2.4] We suppose that all proofs are con- 
ducted by simultaneous induction. Let P and P’ be sets of state predicate such 
that P’ is empty. A procedure that makes such a check is as follows: 


1. If P is empty, the procedure successfully terminates, which means that the 


proof score of Vu : Rs.p(v) for each p € P’ conforms to what is described 
in Subsect.[2.4] otherwise, extract a predicate p from P and go next. 


. Check if a proof score of Vu : Rs.p(v)q has been written. If so, go next; 


otherwise, the procedure reports that a proof score of Vu : Rs. p(v) has not 
been written and terminates. 


. Check if the proof score of Vu : Rs.p(v)q conforms to simultaneous induc- 


tion. If so, go next; otherwise, the procedure reports that the proof score of 
Yu: Rs.p(v)q does not conform to simultaneous induction and terminates. 
Check if the proof score of Vu : Rgs.p(v)q covers all necessary cases. If so, 
put p into P’, put other state predicates that are used in the proof score 
and that are not in P’ into P, and go to[I} otherwise, the procedure reports 
that the proof score of Vu : Rg. p(v) does not cover all necessary cases and 
terminates. 


The procedure can increase the confidence in soundness of proof scores. 


614 Kazuhiro Ogata and Kokichi Futatsugi 


5 Conclusion 


We have described some tips on writing proof scores in the OTS/CafeOBJ 
method and used Tlock, a mutual exclusion protocol using atomicInc, to ex- 
emplify the tips. We have also informally argued soundness of proof scores in 
the OTS/CafeOBJ method. 

We have been developing a tool called Gateau [21] that takes propositions 
used for case splitting and state predicates used to strengthen the basic induction 
hypothesis, and generates the proof score of an invariant, which conforms to what 
is described in Subsect.[2.4] 

Proof scores can also be considered proof objects, which can be checked as 
described in Sect.[4] We think that it is worthwhile to develop a tool, which is an 
implementation of the procedure in Sect.[4]that checks if a proof score conforms 
to what is described in Subsect.2.4] Such a tool can be complementary to Gateau. 
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Abstract. In this paper, we propose a formulation for inference rules 
in Drug Interaction Ontology (DIO). Our formulation for inference rules 
is viewed from the standpoint of process-description. The relations in 
DIO are now described as resource-sensitive linear logical implications. 
The compositional reasoning on certain drug-interactions discussed in 
our previous work on DIO is represented as a construction of a linear 
logical proof. As examples of our formulation, we use some anti-cancer 
drug interactions} 


1 Introduction 


Ontology-oriented knowledgebases have been studied and developed in various 
fields, where knowledgebases are designed in accordance with the underlying on- 
tological structures, such as structures of persistent objects, structures of func- 
tions, structures of processes, etc. Ontologies of the biomedical and bioinformatic 
domain have been studied and developed very intensively, as well as ontologies of 
other specific domains and domain-independent general ontologies|4 Needless to 
say, in order to make an ontology-based database useful and practical, it is im- 
portant to provide a suitable formal language and an inference engine, which will 
make the best use of the ontological structures of the relevant domains. For this 
purpose, various formal language frameworks and various inference engines have 
been proposed in the literature on biomedical and bioinformatic applications. 
Some have employed tree and graph structures for the basic formal struc- 
tures and retrieving-search engines on the tree/graphs are used. |?} Others have 
used the relation-based predicate logic language and its variants as the formal 


3 We would like to express our sincere thanks to the anonymous referee for invaluable 
comments on earlier versions of this paper. 

4 For some survey of ontology-methodology for knowledgebases, see e.g. [17] [22] 26]. 

5 E.g. Gene Ontology (GO) Editorial Style Guide [I] [2]. 
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framework, in which the logical engine based on first order predicate logic is 
often employed explicitly or implicitly (e.g. [13]): B.Smith et al. [24], 
for example, proposed the use of a somewhat limited number of primitive predi- 
cates/relations for a biological ontology in the setting of a fragment of first order 
predicate logic. Description Logics are also considered variants of predicate logic, 
which enhance the latter’s expressive power (for concepts, for example) to some 
extent while preserving its effective computability!) The transformation and in- 
tegration techniques among different ontology languages are also important, and 
some pioneering works have been done by Goguen and others (cf., e.g. [5] [6] [7]) 
using category theoretical tools. 

In the domain of drug/pharmaceutical applications, knowledgebases devel- 
oped on the basis of molecular level ontology, are particularly useful, and it 
would be important to design the reasoning of drug-interactions according to 
the ontology-based knowledgebases, as well as to the traditional static ontolog- 
ical structures of drug-related knowledgebases. Although drug interactions are 
represented by relations in a static manner, it is desirable that the interaction 
processes themselves be captured within the logical reasoning /inference frame- 
work. 

The main objective of this paper is to design a logical inference engine for 
a process-based biological ontology. And for that purpose, we will herein adopt 
a molecular interaction-based process ontology modeling method for some drug 
interactions (Drug Interaction Ontology: DIO) from [27] [28] [29]. In our previ- 
ous work on DIO, we proposed certain schematic or abstract inferences based 
on basic triadic relations in molecular-interactions, such as “Drug a facilitates 
the generation of c under the action of enzyme b in a situated environment” 
(facilitate(a, b,c), for short), or “Drug a inhibits the generation of c under the 
action of enzyme b in a situated environment”. 

While the use of such relations for interaction-processes provides a schematic 
inference tool in Drug Interaction Ontology, and the relation-based reasoning 
often hides the processes level, the question concerning the logical consolidation 
for the inference engine of DIO has been left opened. In this paper, we introduce 
a variant of resource-sensitive logic, linear logic, to explain the basic inference en- 
gine used in our previous work on DIO, where the relational approach is reduced 
to a more basic process approach, and accordingly the basic triadic relation for 
molecular interactions, for example, is expressed as the logical description of in- 
teraction processes, using a resource-sensitive logical implication such as “The 
coexistence of resource a and of the environmental resource b implies product 
c (in the linear logical expression (a,!b) — c).” Here, !b expresses a relatively 
large amount of resources b in the sense that the resource consumption of b can 
be ignored in the context of the reasoning. 


ê For the applications of Description Logics to ontologies, see e.g. B5]. 
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Various multiple interaction processes can then be described as a (linear 
logical) formal deduction proof, where a composed interaction process could be 
identified with a structurally composed logical proof. (In other words, a certain 
part of the DIO process could be simulated by a formal deductive proof process.) 
This suggests that some inquiries for the DIO-based ontological knowledgebase, 
which could be treated by a logical proof-search engine of a variant of linear 
logic. And indeed we shall argue hereinafter that a suitable formulation can be 
given for the basic inferences on DIO (in etc. ) in a resource-sensitive 
logic. Logical representations have been used in various areas of computer science 
where theorem-proving approaches are applied in order to represent the processes 
(cf. e.g. [9]). In particular, in this paper, we shall show that: 


1. The relational-level oriented approach described by Yoshikawa et al. [27] 
[28] is reduced to the process-level approach, by the use of a logical system 
adopted for the linear logical (resource-sensitive logical) process-descriptions 
of drug interactions; 

2. A composed process-description (for drug interactions) can be formulated 
by means of a composed logical proof of the resource-sensitive “linear 
logic” ; 

3. For a negative expression, such as “inhibiting” , used in the reasoning of DIO, 
we introduced a quantitative modality, in addition to the usual modality of 
resource-sensitive logics, in order to adjust the standard resource-sensitive 
process-description logic (such as linear logic) to the specific domain-oriented 
inferences of DIO. The usual modality !A represents the existence of a rela- 
tively large amount of resource A, hence the consumption of resource A can 
be ignored when !A is used for some reusable environmental resources, such 
as enzymes (cf., e.g. [19]). The typical use of the standard modality appears 
as an environmental resource of an enzyme, where the consumption of an 
environmental enzyme during the reaction can be ignored, and the enzyme 
can be considered to exist before and after the reaction. On the other hand, 
we introduced a non-logical domain-specific modality V, a domain specific 
quantitative modality, which describes a significant decrease of resource A. 
This modality is used for describing inhibition. 


While the triadic relation (holding among the drug, environmental enzyme, 
and product) was introduced as the basic relation in the DIO to derive drug- 
drug interactions [28], in this paper we present a formal break-out of this basic 
relation into the process level, where the triadic relation is now described as 
a resource-sensitive linear logical formula. Then one might be able to fit the 
inferences for the drug-drug interactions in a precise logical inference system. In 
particular, this logical inference level is regarded as the process description level 
(using the concurrent process description methodology of linear logic), while 
preserving the basic DIO ontological modeling framework. (see Table [I) This 
precise logical formalization shows a way for DIO to be designed/implemented 
for practical uses, based on a logical inference engine. 
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Table 1. The Correspondence between the basic DIO relations and our logical 
process descriptions 


DIO Descriptions for Meaning Linear Logical Process 
[sonar Imeretin | [Darpon | 
MI(a,b,c) An emerged molecule a (a, !b) —o c 
also written interacts with b, and c 
facilitate(a, b,c) emerges as a result. 

From this relation, “a facili- 
tates the generation of c (un- 
der the presence of b)” is in- 
ferred. 


bind_more(a, b, a * b) A molecular binding of a 
(previously written as and b, caused by the 
bind_more(a, b, a=b) in the/emergence of a, is relatively 
DIO literature) more frequently or durably 
formed than the other bind- 


ing complexes with b in the 
scope of interest. 


inhibit(A, B) The emergence of a resource A — VIB 
A may inhibit B. 


a,b,c, ... denote molecular expressions, and A, B, C denote composed states of molec- 
ular expressions (cf. section 3). 





2 Preliminaries to Drug Interaction Ontology (DIO) 
Modeling 


2.1 Biomedical Ontologies 


In the biomedical domain, several ontologies have been developed for different 
purposes, and many of them were built through the so-called ” concept-centered” 
or ”terminology-centered” approaches. To be sure, these ontologies and/or ter- 
minology systems would be useful as repositories for the tasks involving a large 
amount of complex technical terms and emerging new terms. However, the con- 
cept/terminology centered approaches do not seem too efficient when applied 
to computational inferences. As a solution to this problem, we might think of 
adopting basic ontological schemes in the hope that this would help us develop 
well-structured knowledgebases, which are applicable to biological pathway mod- 
els, and which can deal with sophisticated medical information and so on. 

One of such basic ontological schemes, BFO [21], the Basic Formal Ontology 
developed by IFOMIS [12] for application in the medical domain, provides two 
categories, SNAP and SPAN, both of which are formulated in predicate logic: 
the former corresponds typically to continuants, and the latter to occurrents or 
processes. Another scheme, “The Relation Ontology” which is reported in Re- 
lations in Biomedical Ontologies [23], is designed to provide basic relations that 
cover every granular level, namely as molecules to organisms, in the biomed- 
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ical domain. Here, biological entities are largely divided into continuants and 
processes, and the relation between the two is also defined as an instance-level 
primitive one; P has_participant C (P: process, C: continuant). (Some exam- 
ples of continuants and molecular level processes are shown in the upper part of 
Figure []) 

We think it is important to deal with the relation between continuant and 
process in a more precise or ontological manner; because the manipulations of 
their inter-relation is an essential part of communication in the discipline. We 
have to keep in mind that the main topic in the discipline includes such questions 
as “How are new biological substances endogenously produced as a result of a 
certain stimulus (an emergence of continuants)?”, or “How are certain biological 
reactions/phenomena brought about or regulated?”. Causality expressions are 
often used in tandem with terms describing the inter-relation, and such terms as 
“facilitate” (a term indicating the positive direction), or “inhibit” (a term indi- 
cating the negative direction) are used to refer to their manipulations. However, 
as far as our investigation indicates, there is no ontology that defines “facili- 
tate” / “inhibit” or “facilitator” / “inhibitor” in the context of relations between 
continuants and processes, beyond the terminological level. 


2.2 Drug Interaction Ontology as Molecular Level Process Ontology 


Basic Formula of Molecular Interaction Our previous work on Drug In- 
teraction Ontology (DIO) [27] [28] [29] can be regarded as an attempt to put 
forward an ontology for primitive processes at the molecular level, using three 
role relations between continuants participating in the process. As shown in Ta- 
ble[I] a molecular interaction is in general represented as a triadic (role) relation 
of continuants; MolecularInteraction(a, b,c) or MI(a,b,c). They can be read 
as “An emerged molecule a interacts with b, and c emerges as a result”, where a 
denotes an input or trigger, b denotes an (environmentally situated) object, and 
c denotes an output or resultant product, respectively. In other words, a (input) 
and b (object) are necessary participants to bring about the process (enablers), 
while c (output) is an emerged continuant as a result of the process. A difference 
between input and object is that the latter is a relatively “situated” continuant 
in the field/place of interest (e.g. a pool of reactions such as inside the cell). In 
other words, the output continuant, by its semantic definition, may be a trigger 
or constitute an input of another molecular interaction process. 

The triadic relation for molecular interaction, MI(a, b,c), can also be read as 
“The emergence of a facilitates the emergence of c, which is mediated by situated 
b”. It can also be written as facilitate(a, b,c). The latter part “mediated by 
situated b” could be sometimes omitted. In such case, it may also be written as 
facilitate (a,c). 

In the Drug Interaction Ontology model, we defined different types of interac- 
tions by comparing the existing pattern (e.g. a change of molecular population, 
a change of location pattern) of participants. In this paper, we will deal with 
the enzymatic catalytic reaction (substrate-enzyme reaction) and the so-called 
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“enzyme inhibition” reaction (inhibitor-enzyme reaction). Several instances of 
these relations, both of which are subclasses of MI(a, b,c), will be used later. 


l. Continuants ll. Processes 
Examples of molecular level processes 


Molecular Binding 


Catalytic Reaction 


B00 
| 


Í 
V 


- Transportation 
Examples] — 
Drug & the Derivatives 


lll. Inferences 
Example A Example B 


= Interference “SSS 


Fig. 1. Inferences Using Continuants and Processes 











Transient Complex This triadic model encapsulates details of filling events 
during the time course of the process. For example, formation of transient com- 
plexes and (in some cases) their dissociation processes are encapsulated. In 
physico-chemical molecular interactions, a certain kind of transient binding com- 
plex is formed, but its lifetime may be very short. It might be represented by 
something like MI_transient(a, b,a * b), if we consider the formation of a tran- 
sient complex is the end of the reaction process. Some binding complexes last 
for a relatively long time and are called “molecular bindings” or “formation 
of assemblies”, which we deal with here as subclasses of the triadic molecular 
interaction. 

The efficiency of the formation of transient complexes depends largely on the 
quantitative chances of encountering every enabler molecule and affinity prop- 
erty. An encounter of molecules may be largely influenced by the local molecular 
population (concentration) and by other factors, such as the existence of com- 
petitive interaction counterparts. The latter factor is influenced not only by 
intrinsic molecular affinities, but also by other environmental conditions, which 
might modulate the properties. Unlike chemical reactions, which can occur under 
artificially controlled settings, biological reactions usually take place in an envi- 
ronment filled with many concomitant continuants and processes, under physi- 
ological conditions. 
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As it would be almost impossible to describe every detail of an event including 
all continuants and in a given biological environment, what we represent as the 
triadic molecular interaction model, by its defined meanings, is a dynamic event 
which deviates from a basal biological state. The emergence of continuants as 
an input (or trigger) or an output (or resultant product) indicates a change 
of biological state compared to the basal one or to the one prior to each such 
emergence. 


Efficiency of Interaction Processes: bind_more Relation In accordance 
with the semantic nature of the model discussed above, we here introduce the 
bind_more relation to describe the relative quality of reaction processes. The 
relation includes in it the influence of competitive formation of a transient com- 
plex. The bind_more(a, b,a*b) that is “The molecular binding of a and b, (axb), 
caused by the emergence of a”, is relatively more frequently or durably formed 
than the other binding complexes with b in the scope of interest. This relative 
quality of binding is abstracted from the quantitative information concerning 
key players (concentration, affinity parameters, etc) as well as other biological 
conditions which cannot be defined in every detail. To some extent, however, 
this relation could be led by comparing pharmacokinetic / biochemical parame- 
ters, using the concentrations obtained by an ordinal administration dose. In our 
two examples below, one is known as a “mechanism-based inhibition” which is 
a tighter binding than any other ordinary substrate, and thus the bind_more re- 
lation is clearly manifest as reported in the literature. In the other example case 
too, the bind_more relation is adopted since the binding complex lasts longer 
than most of the ordinary substrates. 


Inhibiting Relation In the literature of the bio-medical domain, narrative 
expressions such as “a inhibits b” are often used while ignoring the types of 
interactions. In Drug Interaction Ontology, direct molecular interaction is re- 
garded as a subclass of triadic molecular interaction MI (a, b, c), and represented 
as MI_xi(a,b, of Here, the relation inhibit(a,b) means “The emergence of a 
decreases b”. The inhibitory relations in other complex type of interactions, or 
in combination of more than two triadic interactions, are called “inferred inhi- 
bitions” in DIO. 


2.3 Examples 


Drug Interaction Between 5-FU and SRV 

Figure] (quoted from with some modifications), is a pathway map which 
is manually created in order to explain the causal effect of the anticancer drug 
5-FU (the upper map) and the effect of the concomitant use of SRV (the lower 


T xis a variable, which indicates interaction modalities such as enzymatic reactions, 


transformations, etc. 
8 5-FU: an anticancer drug, 5-fluorouracil, SRV: an antiviral drug, sorivudine 
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map). The arrows indicate the reactions, connecting node molecules (e.g. bio- 
transformation) corresponding to input and output. The names near the arrow 
lines indicate mediators of reactions (e.g. enzymes). The broken line arrows in- 
dicate aggregates of molecular interactions, while the solid line arrow indicate 
one unit of molecular interaction in the triadic molecular interaction model. 

In this paper, we deal with two types of interactions only. Both occur in the 
lower map and involve the participation of the enzyme DPD. The first inter- 
action, 5-FU —» H2FUra (mediated by DPD) is modelled by the expression 
MI.,(5-FU, DPD, H2FUra). MIes(a, b,c) is a subclass of MI(a,b,c) and an 
abbreviation of “The emergence of a substrate a triggers a reaction catalyzed by 
a situated enzyme b and a is converted to c as a result”. 

The second interaction, shown in the lower map something like BVU ——E 

DPD (down arrow), is represented by the expression MI,.;(BVU, DPD, BVU x 
DPD). 
MI,; (a,b,c) is another subclass of MI(a,b,c). It can be also read as “The 
emergence of a triggers a reaction owing to which a situated enzyme b is less 
populated or less capable of reaction as a result, compared to the state prior 
to the reaction”. In this example, an irreversible binding formation (BVU x 
DPD) is confirmed. It can also be read as bind-more(BVU, DPD, BVU* DPD) 
in this example. It is based on the nature of the so-called “mechanism based 
inhibition” as opposed to the enzyme-substrate reaction in case of the usual oral 
administration dose under physiological conditions. 

It is known that the former process is inhibited (or made less effective) when 
the latter reaction is also occurring. Intuitively “BVU inhibits (mediated by 
DPD) H2FUra formation”, that is it inhibits 5-F'U’s detoxification process. 
In section 4, we will provide a logical reasoning system representing the inhibit 
relation. 


Drug Interaction Between CPT-11 and Ketoconazole The drug inter- 
action between C'PT-11 and Ketoconazole was treated by Yoshikawa et al as 
an example of the model [27]. An illustration of the pathway map of that in- 
teraction is shown in Section 4. This example is also adopted to explain of the 
drug-drug interaction behind the effect of the concomitant use of CPT-11 and 
Ketoconazole. A summary biochemical statement in this example may be ex- 
pressed as “transformation of CPT-11 is interfered by Ketoconazole through 
the modulation of the enzyme CY P3A4”. Its pharmacological semantics may 
be read as “The activity of the anticancer drug CPT-11 is elevated”, or “The 
toxicity of C PT-11 is elevated” for short. 

In this paper, we deal with local biochemical semantics by extracting only 
two molecular interactions that are conjoined to each other. One is CPT-11 
—» APC, which is mediated by CY P3A4, MI., (CPT-11,CY P3A4, APC). 
The other is Ketoconazole —4 CY P3A4, M1I.;( Ketoconazole, CY P3A, Keto- 
conazole x CY P3A4). Although the interaction modality of Ketoconazole with 
CY P3A4 differs from that in the above “mechanism-based inhibiton” , the bind- 
ing between Ketoconazole and CY P3A4 holds longer and tighter than the bind- 
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Fig. 2. 5-FU Associated Pathways and the Influence of Addition of SRV 


ing between CPT-11 and CY P3A4. Therefore, the bind_more relation could also 
be applied in this case. These reactions will be taken up in Section 4 as examples 
of our logical representation of drug interactions. 


3 Preliminaries to Linear Logical Inference Systems for 
Process Descriptions 


In this section we introduce our process description language, a fragment of linear 
logic [| First, we introduce the vocabulary of our process description language as 
follows: 
(1) Logical connectives: A® B (“the molecular binding of A and B”), 

(A , B) (“The co-existence of A and B”), 

A — B (“If A is added, B is generated”), 

! A (“A exists as an environmental resource”), 
(2) Additional non-logical 


modal connective: VA (“A decreases” ), 
(3) Molecular expressions: A, B,C,..., Ao, A1, A2, ... 
Molecular variables: a,6,c,...,@9,@1,@2,... 


The outermost parentheses are often deleted. For example, A & B — C isa 
molecular expression. (((A, B), C), ... ) can be abbreviated as (A, B,C,... ) or 


° We give a linear logical preliminary explanation, which would be minimal informa- 
tion for understanding this paper, more detailed and formal introduction may be 


found in e.g. [19],[4]. (See also [3], [18], [20].) 
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just A,B,C,.... Ay,...,An F Bi,...,Bm is called a sequent, which may be 
paraphrased informally “If Aı,..., An co-exist then B,,...,B,, are generated 
by consuming Aj,...,A,”. A finite sequence of formulas (possibly the empty 
sequence) is denoted by I’, A,.... Parentheses occurring in a formula may be 
deleted when this causes no ambiguity. A sequent is an expression of the form 
TEA, 

There are two kinds of inference rules: 


Ty bk Ay ID F Ay Io F As 


TFA, PFA 





The former has only one upper sequent I, F Aj, while the latter has two upper 
sequents Tı F A; and I> + Ag. Both have only one lower sequent I” | A’. We 
also consider a special kind of inference rules for which there is no upper sequent. 
Such a special kind of inference rule is called an “axiom sequent”. 

In the traditional logics, including the classical predicate logic and construc- 
tive logic, the following logical inference is admitted as a valid inference: 


C— A C— B 
C — Aand B 


Here, A — B is read “If A then B”. Then, the above inference thus says: From 
the two assumptions “If C then A” and “If C then B”, one can reason “If C 
then A and B”. This is obviously true for the usual mathematical reasoning. For 
example, 


f(xz)<a—b<zr f(t)<a—aK<c 
f(£)<a—b<zrandzr< c 


However, when we try to apply this inference rule to the two premises; “If one 
has one dollar then one gets a chocolate package” and “If one has one dollar 
then one gets a candy package”, then one may have the following inference: 


one has $1 — one gets a chocolate one has $1 —> one gets a candy 
one has $1 — one gets a chocolate and a candy 


A naive reading of this inference leads to a wrong conclusion “If one has one 
dollar then one gets both a chocolate package and a candy package”. In fact, the 
following are implicitly assumed when the traditional logical rules are applied 
to some statements; (1) the statements are independent of temporality, i.e., the 
logic treats only “eternal” knowledge which is independent of time. (2) the logi- 
cal implication “—>” is independent of any consumption relation or any causal 
relation. These assumptions are appropriate when we treat the ordinary math- 
ematics. Hence, the traditional logical inferences can be used for mathematical 
reasoning in general. However, when we would like to treat concurrency-sensitive 
matters or the resource-consumption relation we need to be careful with the ap- 
plication of the logical inference rules. The above example illustrates one such 
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situation. In particular, when we would like to study the mathematical structures 
of information or computation in computer science, information science, etc., 
we often need to elaborate the traditional logical inferences since concurrency- 
sensitive setting and concepts for the consumption of computational resources 
or of resources for information processing are often essential in computer science 
and the related fields. Linear logic proposed by Girard is considered one of the 
basic logical systems which would provide a logical framework for such a new 
situation occurring in computer science and its related fields [£] For example, 
instead of the traditional logical connective ^ (“and”), linear logic provides two 
different kinds of logical connectives ® and &, where A ® B means “A and B 
hold in parallel (at the same time)” while A & B means “Either A or B can be 
chosen to hold (as you like) but only one of them at once”. 

The traditional logical implication —> is replaced by the linear implication 
—o, where A — B means “By the consumption of A, B is generated”. With the 
explicit appearance of the resource consumption relation, the conjunction “A 
and B” naturally yields the co-existence of A and B. We use “comma” (“A, B”) 
to denote the coexistence A and B. We also introduce a stronger notion of co- 
existence, namely, that of the molecular binding “A @ B” 

Following [27], we take as a basic primitive relation the specific triadic re- 
lation, called “the facilitate relation (facilitate(A, B,C)’. (“A drug A under 
the environmental resource !B (such as enzymes) generates C” is expressed as 
(A,!B) — C in the logical sequent from the two transitions.) In this paper, we 
express this relation on a process description level as a linear logical F relation; 
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The logical inference rule for “,” is: 


CA DEB 
(C, D) F (A, B) 


When we apply “If one has $1 then one gets a chocolate package” and “If one 
has $1 then one gets a candy package” to the two premises of the left inference 
rule for “,” , then we obtain as a conclusion “one has {$1,$1} — one gets (a 
chocolate , a candy)”, which means “If one has two $1’s (namely, $2) then one 
gets both a chocolate package and a candy package at the same time”. 

On the other hand, the infinite amount of a resource of A is expressed as !A, 
with the help of modal operator ! in linear logic. (!A is such a resource that one 
can consume A as many times as one wants without any loss of !A.) By using this 
modal operator ! one can express the traditional logical truth (i.e., eternal truth) 
within the framework of linear logic. Hence, linear logic contains the traditional 
logic (with the help of modal operator), and linear logic is considered a refined 
(or fine-grained) form of the traditional logic, rather than a logic different from 
the traditional logic. 


10 There are some other approaches in which the traditional first order logic is refined 
in order to capture actions and changes of states. Situation calculus proposed by J. 
McCarthy [15] is one such approach. 

11 Although in the original notation of linear logic, the symbol ® is used for the parallel 
operator, we use a slight different symbolism for it in this paper. (See [19], {4].) 
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Note that the linear logical implication A —o B means that “By consuming 
A, B is generated.” Hence, when A exists and A —o B holds, then B can be 
actually generated at the expense of the resource A. 
On the other hand, the traditional logical implication, as in the one used in the 
above mathematical reasoning, does not consume the premise. That is to say, 
the traditional implication A — B means that when A holds and A — B holds, 
then B holds, where A continues to hold even after B is obtained from A and 
A —> B. (For a precise list of the formal inference rules, see the Appendix at the 
end of paper.) 


The linear logical modal operator !A usually stands for an infinite amount 
of a resource A. If an inference resource A is not resource-sensitive, it may be 
interpreted that such a resource can be repeatedly used without any loss, i.e., in 
the traditional logical sense “A holds” may be interpreted as “There are infinitely 
many amount of A available”, that is, !A, in our symbolism. 


The traditional implication may be expressed by the linear implication (the 
resource-consumptional implication) with the bang operator !; and thus A > B, 
for example, may be represented by the linear logical formula (!A) > B. 


Accordingly, the standard rules for the bang-modal operator ! are formulated 
as follows: 


l -left 
(dereliction-left) 


ATHA 
[ATFA 


(contraction-left ) 


IATA THA 
[ATA 


In the DIO-style reasoning, users sometimes wish to obtain certain inhibiting- 
information. For example, consider the following setting. The production of a is 
normally generated by a drug b. But, with the use of another drug c in the same 
environment, the production of a is inhibited, or in other words, the amount of 
the production of a is substantially decreased due to the use of c. 


To deal with such a case, we introduce a new modal operator, which is called 
the quantitative modality, in symbol V. 
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It represents our thinking about the inhibiting-effects in the DIO-style rea- 
soning. 


V-left V-right 


AT THFX TESA 
VA, [F VE PES,A,VIA 





Now a careful reader might wonder about the consistency of such rules. The 
introduction of such inference rules, however, does not affect the consistency of 
the logical inference rules, as given by the following form of the Proposition. 


Proposition: Modality rules for V are consistent. 


Proof 
By deleting the weak modality symbol, the new rules are still derived rules of the 
fragment of the original linear logic. This means that the consistency problem 
of our logical inference system with the new quantitative modal operator is 
reduced to the consistency problem of the original linear logic. Since the original 
linear logic is known to be consistent, our new rules for the quantitative modal 
operator are consistent. 














4 A Linear Logical Formulation of Basic Relations in DIO 


4.1 Basic Relations 


The triadic relation facilitate(a, b,c) which is explained in section 2.2 is con- 
sidered as a consumption process (i.e. input or object may be consumed in this 
process and generate output). We have formulated this consumption relation by 
a linear logical consumption relation. For convenience, we used the following 
abbreviations: 


aı : input 
ag : object 
a3 : output 


Now the triadic relation of facilitate(a1, a2, a3) is logically described as follows: 


(a1, lag) — a3 


Then using linear logical inferences, one can obtain the following general 
abstract Lemma. 


Lemma 1 
lag, facilitate(a1, a2,a3) F aı — ag 
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This means that if there is an environmental resource !a2 and if facili- 
tate(a1,@2,a3) holds, then if a; is added, a3 is generated (addition of aj 
generates a3). 


Proof: the following is a formal linear logical proof of this Lemma. 


ay E ay lag Flag 
a1, !a2 F (a1, !a2) parallel agra 
1, -42 ane 3 3 = left 


a1, laz, ((a1,!a2) — a3) as ipii 


laz, ((a1,!a2) — a3) F ay — ag 


We show some examples of concrete applications of the above Lemma. 


Example 4.1.1: CPT-11 is catalyzed by CE and converted to SN38 

It is known that this actually holds in a certain part of the human liver. CPT-11 
is an anti-cancer drug, which is also known as Irinoteca CE is an enzyme 
which exists ordinarily in the human liver, and SN34!4 is a drug-derived 
substance which is generated as a result of CE mediated catalytic reaction. 
We regard a; as CPT-11, az as CE, and a3 as SN38. And by this Lemma, if 


1. !ag exists as an environmental resource; and 
2. facilitate(a;, a2,a3) actually holds. 


Therefore, the two premises of the Lemma !az and facilitate(a, a2,a3) hold. 
Hence, the Lemma tells that if CPT-11 is given, SN38 is generated. 


Example 4.1.2: 5-FU is catalyzed by DPD and converted to H2FUR 

It is known that this actually holds in a certain part of the living human body. 
5-FU || is an anti-cancer drug, used to treat some types of cancer. And DPD 
is an enzyme that exists mainly in the liver. On the other hand, H2FUR 
is a drug-derived substance that is generated as a result of catalytic reaction 
mediated by DPD. 

As in Example 4.1.1, we regard a as 5-FU, a as DPD, and a3 as H2FUR. By 
Lemma 1, if 


1. !ag exists as an environmental resource; and 
2. facilitate(a;, a2,a3) actually holds. 


Then facilitate(a;,a2,a3) can be deduced in a logical proof. 


12 topoisomerase-I inhibitor, 
7-ethyl-10-[4-1(piperidino)-1-piperidino]-carbonyloxycamptothecin 
carboxylestherase 

4 7-ethyl-10-hydroxycamptothecin 

15 5-fluorouracil 

16 dihydropyrimidine dehydrogenase 


13 
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4.2 Biomolecular Bindings Relations 


To describe biomolecular bindings, the @ (called tensor) symbol is used as a 
connective. 


Rule: Biomolecular Binding 
We express biomolecular bindings as follows: 


(a1, @2) — a1 & a2 


This means that if a; and az exist, and it is known that they will actually bind 
together, then the consumption of a; and ag will generate the bound molecule 
Q1 Q az. 


Example 4.2.1: Ketoconazole binds to CYP3A4 

Ketoconazole, an anti-fungal drug, is known to be slowly metabolized by 
CY P3A4 forming stable complexes. CY P3A4"4 is one of the so-called drug 
metabolizing enzymes which mainly exist in the human liver. 

We express this as follows: 


(Ketoconazole, CY P3A4) — Ketoconazole ® CY P3A4 


This means that if Ketoconazole and CY P3A4 co-exist, Ketoconazole and 
CY P3A4 will bind together. 


Example 4.2.2: BVU binds to DPD 
BVU (bromovinil uracil) is a drug-derived substance and binds to an enzyme 


DPD. We express this as follows: 


(BVU, DPD) — BVU ® DPD 


This means that if BVU and DPD co-exist, BVU and DPD will bind together. 


4.3 Inhibiting Relations 


For inhibiting relations, we use the quantitative modality operator V as 
introduced in Table [I] 


Modality Rule I: Vright 


TES !a 


TES, a, V'a vege 


17 cytochrome P-450 isoform 3A4 
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An environmental resource !a can be considered as the sum of a and V!a. This 
means that if a part of la is used in the environmental resource !a, then the 
environmental resource is consumed, and the amount of usable environmental 
resource !a will decrease. This inference rule will be used to express the basic 
“inhibition” relation in DIO; a use of !a which results in a product which may 
inhibit !a. 

Modality Rule II: Vleft 


a, I H bı, be, cy bz 


LIF vb, vo, yb, Vest 





If the effects of A decrease under the same context of I’, the effect of Products 
of {b1, b2, ..., bn} may be affected. We define inhibit( A, B) as A — B, using V. 
Using the above rules, we can infer the inhibiting relations. For convenience, we 
also use the following abbreviations. 


a, : Drugl 
ag : Enzyme 
ag : Product 
a4 : Drug2 


Lemma 2: 


(a4, a2) — (a4 ® a2), !ag F inhibit(ag, !a2) 


This means that if there is an environmental resource !az and if bind(a,b) 
actually holds, then if a4 is added, the bound molecule a4 ® az and the de- 
creased az are generated. (The addition of a; and a4 generates the decreased az.) 


Proof: 
lag Hag 
$$ —_ Vright 
a4 H a4 lag H Viaa, ag g 
a4, !a2 F V!az, (a4, a2) a4 @ a2 F a4 Q ag 


— left 


(a4, a2) —o (a4 Q a2), aa, !ag F- a4 8 ag, Vag . p 
a To wetkening-right 


(a4, a2) — (a4 Q a2), a4, !a2 F Viag oe 
(a4, a2) — (a4 Q az), !az F ag — Viag oe 


Example 4.2.1: Ketoconazole may decrease the amount of CYP3A4 
We use the following abbreviations. 


az : CY P3A4 
a4 : Ketoconazole 


632 Mitsuhiro Okada et al. 


We can see, 


1. laz really exists as an environmental resource; and 
2. ag and a4 are actually bound together, namely bind(a4, a2) holds. 


Therefore, the two premises of Lemma 2 hold. Then it can be proved that “a4 
may decrease the amount of a2”. 


Example 4.2.2: BVU may decrease the amount of H2FUR 


In the same way as in Example 4.2.1, we use the following correspondence Table. 


ag: DPD 
ag: BVU 
We can see, 


1. laz really exists as an environmental resource; and 
2. a2 and a4 are actually bound together, namely bind(aa, a2) 


holds. Then, the two premises of Lemma 2 hold. Therefore, it can be proved 
that “a4 may decrease the amount of a2”. 


By using Lemma 2, we can infer Lemma 3 below. 
Lemma 3: 


bind(a4, a2), !a2, facilitate(a1, a2,a3) F inhibit((a1, a4), Vas) 


This means that if bind(a,b) actually holds, and if there is an environmental 
resource !ag, and if facilitate(a1,a2,a3) actually holds, then if a, and a4 are 
added, the decreased ag is generated (the addition of a; and a4 generates the 
decreased ag). 


Proof: 


From Lemma2 left la2,a1, facilitate(a1,a2,a3) F az vleft 
ee e $$ dole 
bind(a4, a2), a4, !a2  Viae Viaz, a1, facilitate(a1, a2,a3) F Vaz 





cut 
bind(aa, a2), a1, a4, !a2, facilitate(ai, a2,a3) F Vas re 

ori 

bind(aa, a2), !a2, facilitate(a1, a2,a3) F (a1, a4) — Vas I : 


where bind(a4,a2) is the abbreviation of (a4,a2) — (a4 ® a2), facilitate(ai, a2, a3) 
stands for (a1, !a2) — a3. 


Example 4.3.1: Ketoconazole may inhibit the generation of APC 


We use the following correspondence Table. 
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ay: CPT-11 
a2: CY P3A4 
a3: APC 


aa : Ketoconazole 


APC! is a drug-derived substance which is generated in this bio-molecular process. 
We can see, 


1. !a2 really exists as an environmental resource; and 
2. facilitate(a1,a2,a3) actually holds; and 
3. a4 and az are actually bound together, namely bind(aa, a2) holds. 


Then the premises of Lemma 3 hold. Therefore, it can be proved that “a4 may inhibit 
the generation of a3 in the presence of a1”. 


Remark I: Non-monotonic Reasoning. Notice that non-monotonic reasoning 
is used in these inferences regarding the presence of a4 (Ketoconazole in the above 
example) in the resource. The proof of a3 is replaced by a proof of Vaz under the 
assumptions of facilitate(ai,a2,a3) and giving ai with !a. This is one of the essential 
features of inhibition-related reasoning in DIO. 


Remark II: Domain Specificity of the Quantitative Modality VA. 
Although the introduction of the quantitative modality VA keeps the consistency 
of the logical proof system (as was shown in Proposition 1 in Section 3), this new 
modality is very domain-specific and destroys the basic universal structure of logical 
syntax, namely the cut-eliminability. In fact, the above proof of Lemma 2 serves as a 
counter-example of the cut-elimination theorem. 


Example 4.3.2: BVU may inhibit the generation of H2FUR 


In the same way as in Example 4.3.1, we use the following correspondence Table. 


ai: 5-FU 
az: DPD 
a3: H2FUR 
aa: BVU 


We can see, 
1. !a2 really exists as environmental resource; and 
2. facilitate(a1,a2,a3) actually holds; and 


3. a2 and aa actually bind together; namely, bind(aa, a2) 


actually holds. Then the premises of Lemma 3 hold. Therefore, it can be proved that 
“aa may inhibit the generation of a3 in the presence of a1”. 


18 7-ethyl-10-[4-N-(5-aminopentianoic acid)-1-piperidino]-carbonyloxycamptothecin 
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4.4 Remark on Logically Higher-Level Reasoning 


For the implementation of highly complicated pharmacological relations, we 
claim that meta-level inferences on proofs, rather than assertions are required. 
Our logical formalism could be used as a tool to classify relations into different 
levels, in particular, we could clarify that some reasoning of DIO is not a reason- 
ing at the assertion level, but that at the meta-logical level, namely at the level 
of inferences on proofs. Here, we show an example of such a meta-level reasoning 
in DIO. 

Using the proof of Lemma 1 and the proof of Lemma 2, we can obtain the 
Meta-reasoning I given below by a meta-level-reasoning. (Notice: meta-level 
inference is shown by the bold line, and the inference requires two proofs 
(or, under the linear logical) proofs-as-composed processes identification, two 
composed processes as the premises of the inference.) 


Meta-reasoning I 


la4 Hag r 
—> vright 
agH aş laş F Vlag, a4 
ag, !a4 F Vlag, (ag, a4) (ag, a4) ag Q a4 la4, a1, ((a1,!a4) — a5) F- a5 
aj Hay fag Hag ag, !a4 F ag @ ag, Vlag Viag,a1,((a1,!a4) — a5) F Vas 
a1, !ag F (a1,!a2) agt ag a1, ag, !a4, ((a1, !a4) — a5) F ag Q a4, Yag 
a1, !ag, ((a1, !a2) — ag) F ag a1, ag, !a4, (a1, !a4 — a5) F Vas 
la2, ((a1,!a2) — ag) F ay — ag !a4, (a1, !a4 — a5) F (a1, ag) — Vas 


lag, !a4, ((a1,!a2) — a3), ((as, a4) — (as @ a4)), ((a1, !a4) — a5) F (a1, a6) — Aag 


For example, we take the Ketoconazole - CPT-11 interaction process as one 
such example. We use the following correspondence Table. 


aı : CPT-11 
a2: CE 

az : SN-38 
aa: CY P3 A4 
as : APC 


ag : Ketoconazole 


And here, we explain this situation more briefly. CPT-11 generates S N-38 in 
the situated environment of CE-enzyme, while CPT-11 generates APC in the 
situated environment CY P3A4, in some region of the liver. Those facilitation 
processes are described as linear-logical proofs in subsection 4.1, (which is repre- 
sented by the left-upper proof in the Meta-reasoning I). On the other hand, the 
existence of Ketoconazole inhibits CY P3.A4, which we have formally described 
as a linear logical proof with the quantitative modality V in the previous sub- 
section, (which is represented by the right-upper proof in the Meta-reasoning I). 
Then, one can reason about the interaction between the CPT-11 - CE - SN-38 
facilitation process and the CPT-11 - CPY3A4 - Ketoconazole - APC inhibi- 
tion process, which results in a further facilitation of CPT-11 - CE - SN-38 
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process (due to the fact that the resource CPT-11 can be used for generating 
SN-38 because of the assumption of CPT-11 for generating APC is inhibited). 


oo 
l drug II 





CPT-11 


ea drug I 
- 


Fig. 3. CPT-11 and Ketoconazole Interaction Process 











5 Discussion and Conclusion 


5.1 Discussion on the Symbolic Inference Methods and the 
Simulation Methods for Drug-Interaction Knowledge 


We introduced a fragment of a logical inference system for the symbolic reasoning 
of drug interaction knowledge, where the usual treatment using the relations- 
based (or equivalently predicates-based) reasoning is analyzed into more prim- 
itive process description-based reasoning, using a variant of resource-sensitive 
logic. 

A symbolic logical reasoning encapsulates concrete numerical values and 
the detailed (e.g. chemical) levels of processes, and it reasons about the drug- 
interaction processes on a certain abstract level, which could make the compu- 
tational processing less costly and which efficiently provides important informa- 
tion. Of course, such abstract and symbolic approaches have some drawbacks: 
the results of inquiries depend on the way of abstracting the real concrete situ- 
ations in the organic cells, and it might sometimes cause the validation problem 
concerning an abstract modeling process. 

Here in DIO, we take the ontological methodology proposed by Yoshikawa 
et al. [27] [28] [29], where some specific way of abstraction and some symbolic 
way of reasoning/inferring are claimed to be ontologically essential and useful 
for the knowledge of drug-interaction processes. On the other hand, following a 
different approach, such as the simulation method based on numerical data from 
verious in vitro experiments, it would be possible to realize a wide range of inter- 
action processes, e.g. in a cell. However, for human response to drugs (including 
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drug-drug interactions), the results obtained with the simulation model using 
such numerical data have not been satisfactory so far, in view of quantitative 
prediction. 

Moreover, the setting of a simulation model is usually more complex and 
computationally costly. In fact, it would be ideal if one could combine the two 
approaches, the symbolic reasoning system and the simulation system, to ob- 
tain useful information such as individual differences of drug responses. For 
example, for a first estimation of possible drug-drug interactions or drug ef- 
fects, one would like to use a symbolic reasoning/inference system, while when 
some important side-effects of multiple use of drugs are found by the symbolic 
reasoning/inference system, one would like to re-examine them with more com- 
putational cost/resource by a simulation system taking into account quantitative 
matters. 


5.2 Concluding Remarks 


We have utilized a variant of linear logic, namely a resource-sensitive logic, to 
formulate some of the basic inferences used in our previous work on Drug Inter- 
action Ontology (DIO). In particular, we have obtained the following results: 


1. The original relational approach of DIO could be logically grounded in terms 
of the logical description level of the interaction processes, by the use of linear 
logical process descriptions. 

2. The informal arguments for the basic information on facilitation, 
inhibition, molecular binding, etc. used in the former work of DIO are now 
described at the logical proof level. 

3. A linear logical proof has a direct meaning in DIO as an interaction process. 
A complex process corresponds to a composite proof-structure. 


In the course of our formulation of the logical language for the interaction- 
process descriptions, we introduced a new modality, the quantitative modality, 
in order to reason about a certain negative effect (inhibition, i.e., decreasing 
tendency of a product due to the interference of another co-existent compo- 
nent). Our logic for representing process descriptions has the characteristics of 
non-monotonic reasoning and resource- sensitive reasoning. And it seems both 
characteristics are necessary for the reasoning about the process description level 
of knowledge/information based on a drug-interaction ontology (DIO) model. 


5.3 Future Work 
We list here some items on which we plan to work in the future. 


1. We plan to extend the basic part of our reasoning/inference system to a 
further range of drug-interaction ontology and related biomedical ontologies. 
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2. We also plan to build a combined framework of the symbolic inference 
method proposed in this paper and the simulation method mentioned earlier 
in this section, to obtain an integrated framework for the drug-interaction 
knowledge/information tool. 

3. We plan to apply our linear logical process description framework to further 
ranges of process-based ontologies. It would be useful for the setting of pro- 
cess ontology modeling to investigate the relationship between the traditional 
relation/predicate-based descriptions (whose reasoning basically follows the 
traditional first order logic) and our process-based descriptions (whose rea- 
soning follows linear logic and its variants, as explained in this paper). In 
fact, the formal (logical) process descriptions introduced in our linear logi- 
cal language framework would be useful to define process-related interactive 
relations, such as facilitate, bind more, inhibit, etc. , while, once precisely 
defined, the precise formal definition and the precise operational meaning of 
such relations may be encapsulated for some simple queries on DIO. In fact, 
the statement of Lemma 1,2,3 has the form of horn clauses once those rela- 
tions are defined by a linear logical formula. In such cases, one could make 
use of the traditional predicate logical inference engine, while it is necessary 
to return to the precise resource-sensitive (linear) logical inference engine for 
more resource-sensitive queries. So, interconnecting the traditional relational 
(and predicate-logical) approach and the linear logical approach would be 
useful for the practical purpose of DIO. Logic programming language frame- 
works have been well investigated in former work (by Dale Miller’s group 
and others [10] [16]). Such logic programming frameworks might be useful 
for this direction of research. 

4. We have not developed semantics for our system yet, except for the under- 
lying operational semantics of our proof-syntax (of [19]). We plan to develop 
phase semantics by introducing the semantic denotation of VA (“significantly 
small amount of A”). 
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Appendix I 


Formal Rules for a Fragment of Linear Logic with Quantitative 
Modality (LLQ) 


The following are formal rules for a fragment of Linear Logic with Quantitative 
Modality (LLQ), which is used in this paper. For further basic backgrounds of 
linear logic and process-descriptions with linear logic, see [19] 


Definition 1 (Inference rules for LLQ). Below, A and B represent arbi- 
trary molecular expressions and I’, A, X, I represent arbitrary (finite) se- 
quences of molecular expressions, including the case of an empty sequence. A 
sequent Aj,..., An F Bi,...,Bm means informally (A1, ..., An) — (Bi,...,Bm), 
namely, if Aj,..., An are given at once, then B1, ..., Bn are generated by consum- 
ing Ai, ap Ån- 


| 


Axiom sequent 
Logical axiom sequent 


AFA 


Cut-rule 


| 


THA, A ASE 
POCA M 


Multiplicative (Parallel) 


66 99 


.” (parallel)-right 


FFA,A SELB 
T,X- A, H, A,B 
— Linear Implication 
—o-left —o-right 
FFLA,A B,Y-EI A, TH- A,B 
A— B, T,X F A, TrFAA—~B 
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— Weakening 


Weakening-right 


THA,A 
TFA 


(Note that the right commas are the parallel conjunctions in our sequent 
calculus formulation, and the above weakening rule is a derived rule in the 
standard weakening rule of linear logic (in cf. [I9])). 


— Modality 
!-left 
(dereliction-left) 
A,TFA 
IATHA 


(contraction-left) 


IA, IA, THA 
[ATA 


— Quantitative Modality 


V-left 
A, THX 
VA,PEVS 





Appendix IT 


Scope of the DIO Model and Its Limitation 


V-right 


PES IA 


PES A TA 


Pathway Model and Phenomenon in the Real World Our model for Drug 
Interaction Ontology, the triadic molecular interaction model, can be considered 
as an atomic component of the pathway model, which is used to describe not only 
biological phenomena but also the mechanisms of drug action. It is often used to 
explain the causality of drug action, side effects and other inducible phenomena. 
A biological reaction in reality, however, is very complex, and a pathway model 
itself is a kind of abstraction from information in nature. They both constitute 
only part of the full stream of events, being provided to explain phenomena of 
particular interest. In a pathway model for a given phenomenon, disregarded 
reactions are those with unknown associations, or with less influential effect. 
Obviously, undiscovered reactions are not included. 
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Usually, the time scale of reactions in a given pathway is implicit. The time 
scale in general, corresponds to the ones in experiments, by which the possi- 
bilities of reactions are identified and verified. These are mostly shorter than 
a monthly or yearly range. Long term reactions such as the accumulation of 
injury in mitochondrial DNA during the normal aging process are disregarded. 
Likewise, very short-time events in quantum level (e.g. atomic, electronic) are 
disregarded. 

This model can be used for the (re)construction of a pathway model, by pro- 
viding conjunction (inference) rules. This approach is different from the pathway 
decomposition approach or the pathway first approach, where a molecular in- 
teraction is tightly bound to its parent pathway model. An arbitrary molecular 
interaction represented by a triadic relation could be potentially integrated as a 
sub-process of different pathways, which would also be the case in the real world. 
Another aspect of the molecular interaction network in real world phenomena 
is its dynamic nature and complexity influenced by organism level regulations. 
There would be multiple feedback regulations and loops in the network map. 
Inferences using relative relations such as inhibit or facilitate, without using 
quantitative information and a time scale, would have limitations in such com- 
plicated network models. 


Scope of the Molecular Binding Model Our triadic molecular interaction 
model, described above, reflects the process of binding-based interactions, medi- 
ated by a transient complex. There also exist non-binding type processes such as 
a movement by natural diffusion, a bio-transformation without mediators, and 
so on. These are outside the scope of this paper. 


Molecular Level Granularity This interaction model is based on molecular 
interactions while our relational database schema include location information 
for each process participant. The relation is molecule part_of or locatedin a 
subcellular component, and/or molecule part_of or located_in tissue/organ com- 
ponents. Some reactions, such as a transporter mediated process, are location 
sensitive. On the other hand, the type of reactions examplified in this article is 
not location sensitive. In our examples it is presumed that the participants and 
the events are all allocated in the same field, and thus we disregard the attributes 
of location. When we deal with location-sensitive reactions, however, inferences 
including different biological granularity would need further formalism. 


Application for Prediction of Real World Event As was pointed out 
above, the pathway model itself has certain limitations in view of real world 
events. In addition, it does not deal with numerical data, and is not capable 
of a quantitative estimation of molecular events. We made some abstraction 
from such information by introducing relative relation in terms of “bind_more’” . 
The relative relations are also embedded in the semantics of triadic molecular 
interaction itself: The emergence of an input triggers the execution of a process 
and causes the emergence of a new product (output). 
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Instead of dealing with numerical information, this model deals with a qual- 
itative change of amount by providing semantics of a relative change to the 
relations, such as activation (relatively increased level of execution) or inhibit 
(relatively decreased level of execution). For a more precise prediction and for 
more complicated pathway network models (such as those of loops), the use of 
numerical data would be important. A practical solution might be a cooperative 
inference between these methods and simulation methods. 
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